Background compile shaders

Problems building or running the engine, queries about how to use features etc.
Post Reply
rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Background compile shaders

Post by rpgplayerrobin »

Ogre Version: 1.12.13
Operating System: Windows 10
Render System: Direct3D9

Hey!

I am trying to compile shaders threaded (continuing from the off-topic from this thread: viewtopic.php?t=96657).

It sounded so easy that 1.12.13 could handle this, but now I am not so sure.

How do you actually background load a resource group, or at least just the shaders?
There is no tutorial from what I can see, not even in the 13.3.4 version. I have searched through many posts and only found examples of loading one resource at a time, and it seems only for manually created resources, but nothing about loading them that way from a resource group.

This is how I normally load a resource group:

Code: Select all

tmpResourceGroupManager.initialiseResourceGroup(m_groupName);
tmpResourceGroupManager.loadResourceGroup(m_groupName, true, true);

This is how I tried to do it now instead:

Code: Select all

ResourceBackgroundQueue& tmpResourceBackgroundQueue = ResourceBackgroundQueue::getSingleton();
BackgroundProcessTicket tmpTicket = tmpResourceBackgroundQueue.initialiseResourceGroup(m_groupName);
while (!tmpResourceBackgroundQueue.isProcessComplete(tmpTicket))
	Ogre::Root::getSingleton().getWorkQueue()->processResponses();

tmpTicket = tmpResourceBackgroundQueue.loadResourceGroup(m_groupName);
while (!tmpResourceBackgroundQueue.isProcessComplete(tmpTicket))
	Ogre::Root::getSingleton().getWorkQueue()->processResponses();

However, when I time it, it is exactly as slow as the non-background loaded version.
What I guess is happening is that it is only creating and handling one single thread. So this would probably only be good if I load multiple resource groups at once before I call processResponses.
I was thinking it would create threads automatically based on how many resources there were to load, but this is not what is happening I guess.

I also tried setting the amount of worker threads at startup to a higher value, but it made no difference on the speed:

Code: Select all

// This is done directly after root is created
int threadCount = OGRE_THREAD_HARDWARE_CONCURRENCY;
DefaultWorkQueueBase* tmpDefaultWorkQueueBase = (DefaultWorkQueueBase*)m_Root->getWorkQueue();
tmpDefaultWorkQueueBase->setWorkerThreadCount(threadCount);

The compiled version of my OgreSDK is using "#define OGRE_THREAD_SUPPORT 3".
I have debugged it and it is definitely threaded (_threadMain and its functions are used for example).

So, instead of using background queue like this, I attempted to use threaded prepare on only my shaders after initialiseResourceGroup has been called, but it seems initialiseResourceGroup already triggers prepare (it calls D3D9HLSLProgram::prepareImpl for the shaders and it compiles the microcode).
This makes it a race condition for the prepare function, which is not what we want of course.

So how do you actually prepare shaders with this system from a resource group?
It seems impossible to get a ResourcePtr from the shader program to attempt to prepare it in a background thread, since the resource is not created before initialiseResourceGroup, which in turn calls prepare...

As loath pointed out, initialiseResourceGroup should not call prepare according to here: https://ogrecave.github.io/ogre/api/lat ... ement.html.
But it actually does when I debug it.
The only way for it to skip the prepare function in initialiseResourceGroup in the Ogre source code is if the resource variable mIsManual is true, but that variable is private and cannot be altered after creation, so doing that at the resourceCreated listener function is impossible.
If it was possible in this function to alter it to not call prepare from initialiseResourceGroup, it could potentially solve the entire issue I guess?
Should I alter the Ogre source to add a new variable for each resource for this purpose, or just make mIsManual public and temporarily alter that while it is loading? I don't even know if that is correct or if it would work.

How are you actually supposed to handle this?
Must I create/declare all shader resources manually at startup (which would be a huge list I would have to write manually and update each time I make a new shader or remove one) and then call prepare for all of them?

This is how loath is doing it, or sajjadcsharp for textures at viewtopic.php?p=448655.
But to do that in a correct manner would instead be to make a new parser for all shader programs and parse that and that way declare everything automatically and using background loading on those before using initialiseResourceGroup on the resource group (though initialiseResourceGroup might actually create them again... this is hard).
But that is exactly what the standard parser is made for (for .material files for example), and seems to be very weird to have to read them twice in two different ways (one internal in Ogre and one in user-code).

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Background compile shaders

Post by paroj »

Code: Select all

BackgroundProcessTicket tmpTicket = tmpResourceBackgroundQueue.initialiseResourceGroup(m_groupName);
while (!tmpResourceBackgroundQueue.isProcessComplete(tmpTicket))
	Ogre::Root::getSingleton().getWorkQueue()->processResponses();

generally, threading is of no use if you just wait for completion after starting the thread. The idea is that the call returns immediately so you can do something else meanwhile.

Yes, we could just multi-thread initialiseResourceGroup itself instead of merely sending it to the background. However, there are issues preventing that:

  • all resources in a single .material/ .program file must be created in order as they may depend on each other
  • only scripts of the same type may be processed in parallel. E.g. a .material file may only be processed after all .program files are done.

The preparing of GpuPrograms at initialiseResourceGroup time is caused by default parameters. To parse those we need to prepare the GpuProgram to check whether such a parameter actually exists. If you do not declare any default parameters, the GpuProgram should be left unloaded.

loath
Platinum Sponsor
Platinum Sponsor
Posts: 290
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: Background compile shaders

Post by loath »

i debugged this to see why initializeResourceGroups () might be calling prepare () as it does not appear to be DIRECTLY doing so from a quick code review. the issue is your scripts likely include shader parameters. when the script parser creates the shader parameters this internally triggers the call to prepare (). Ogre::HighLevelGpuProgram::createParameters () needs the compiled program so it can match the source code's variables to the script's declared variables.

my shaders are generated on the fly so i don't assign any program parameters until AFTER compilation is complete. this doesn't help you but it explains why i don't see this issue. (and in fact this reminded me of why i have to assign the parameters after compilation)

that being said... i'm sure there is a solution here for you without too much work. let me do some more digging.

loath
Platinum Sponsor
Platinum Sponsor
Posts: 290
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: Background compile shaders

Post by loath »

first idea:

  1. the first idea that comes to mind is to separate your shader declarations into *.program files (instead of *.material files). see https://ogrecave.github.io/ogre/api/lat ... grams.html where it says:

    You define the program in exactly the same way whether you use a .program script or a .material script, the only difference is that all .program scripts are guaranteed to have been parsed before all .material scripts, so you can guarantee that your program has been defined before any .material script that might use it.

  2. remove any default parameters in *.program files so that prepare () is not called when they are loaded. place the parameters in the material script shader reference instead. i tested this with a simple .program and .material and it appears prepare () is called when the material is processed. ideally each of these *.program scripts would be in a separate resource group so you could then call:

    Code: Select all

    tmpResourceGroupManager.initialiseResourceGroup(m_groupName);
  3. now iterate all of the unloaded shader Ogre::ResourcePtr / Ogre::GpuProgramPtr in the Ogre::GpuProgramManager and either use a custom c++ thread queue (just a simple std::vector+std::mutex+std::threads via .join()) or using the Ogre::ResourceBackgroundQueue:

    Code: Select all

     resource->setBackgroundLoaded (true);
            auto ticket = Ogre::ResourceBackgroundQueue::getSingleton ().prepare (resource->getCreator ()->getResourceType (), resource->getName (), resource->getGroup (), false, NULL, NULL, this);

(i can post more code if this approach in general is acceptable)

loath
Platinum Sponsor
Platinum Sponsor
Posts: 290
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: Background compile shaders

Post by loath »

second idea:

  1. place all your shader .material / .program files in a common filesystem hierarchy. example: c:\myapp\media\programs\*. then manually parse these .material or .program files (rolling your own simple .material parser just for basic shader data or by ripping from Ogre::GpuProgramTranslator). create each shader program while ignoring any default parameters (to avoid forcing prepare () to be called). set all the required parameters (ex. vertex vs fragment shader, shader generation like vs 3.0, entry point, etc). use something like https://en.cppreference.com/w/cpp/files ... y_iterator to locate all the files.

  2. then compile with c++ threads or the Ogre::ResourceBackgroundQueue as above.

  3. load the .materials and .program files using your old techniques. since the compiled programs exist in the Ogre::GpuProgramManager the default parameters etc will be properly added per the script files.

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Background compile shaders

Post by rpgplayerrobin »

Thank you for your answers! :D
It has helped a lot in trying to figure out how to do this.

Now after a while of coding, I think I found a possible solution, avoiding multiple materials or a special load order and that it supports default parameters in the materials.

just a simple std::vector+std::mutex+std::threads via .join()

Do I need to use mutex in order to compile shaders? Can't many shaders compile at the same time?
Is it because of D3D9HLSLProgram::prepareImpl adding/getting it from/to the microcode?
Is D3D9HLSLProgram::prepareImpl/D3D9HLSLProgram::compileMicrocode not thread safe with multiple threads using it?
In that case, how would I handle this? If they are not thread safe for multiple threads at once, how can I ever load them faster than a synchronous compiler would?
With my current solution, when I load the shaders synchronous and exit the game, my shader cache is identical to when I do the same thing but with multiple threads, so I guess that is an indicator it is thread safe from multiple threads at the same time?

My current approach is that I add a new event to the Ogre Source before it tries to set the default parameters, which I use a listener for with ScriptCompilerListener::handleEvent, where I basically copy what I need (the compiler, program and its params), and then return true so it won't continue in the Ogre Source with how it sets the default parameters there (more in detail below).

That means that I get the program and its parameters in my user-code to be able to set them later.
Also, at ScriptCompilerListener::handleEvent, I just add its prepare function to a background thread (either with Ogre WorkQueue or my own thread manager).

Later, when the shader has been compiled (and called ResourceBackgroundQueue::Listener::operationCompleted or just waited until a thread has finished with my own thread manager) I add back the parameter list to it by simply using the same code Ogre wanted to do before on my "compiler" and "params" values I saved earlier (using GpuProgramTranslator::translateProgramParameters).
That makes it so I can load shader materials that has parameters in them without having to do a special way of loading them with multiple materials or in a certain order.

I only do the above if I want it on a resource group, and that resource group must only be using .hlsl files and .material files to load those shaders (nothing else, not any textures, normal materials, compositors or anything).
That way I know for sure that it cannot be messed up by any load order.

Here are my results when I profile the loading of that resource group full of shaders with different methods:

Using no threading: 20.300716
Using Ogre WorkQueue with 6 threads: 20.101999
Using Ogre WorkQueue with 12 threads: 19.973398
Using Ogre WorkQueue with 50 threads: 19.875628
Using my own thread manager with 6 threads: 6.532257
Using my own thread manager with 12 threads: 6.384386
Using my own thread manager with 50 threads: 6.079521

Updating or not updating the loading screen has no significant effect on the timestamps above for any test I tried.

For Ogre WorkQueue, I add each shader (HighLevelGpuProgramPtr) to a new "tmpResourceBackgroundQueue.prepare" call.
I find it kind of strange that the Ogre WorkQueue way is so slow... If I set it to use 12 threads it still only seems to compile one at a time, and for me it seems it even compiles them in the order I added them in (but I might be wrong), when I would expect some of them to render slower than others, which would make them not in the same order.
I really don't understand how the WorkQueue works, because how can it almost be as slow with 6 threads compared to 50 threads? Is it really only on one thread even if I set it to more?
And how can it almost always be just as slow as not using any threading at all?
Also, calling "prog->setBackgroundLoaded(true);" before "tmpResourceBackgroundQueue.prepare" seems to have no difference when it comes to the speed.

For my own thread manager, I add each shader (HighLevelGpuProgramPtr) to a new thread and call resource->prepare, and make sure only X amount of threads are used at a time.
So in short, my own thread manager is 3 times faster than the WorkQueue, and I don't really understand why.
Here is an explanation of my current code in detail:

Here is how to use my code on a resource group (with only shaders in it), pretty simple:

Code: Select all

CShaderBackgroundLoader* tmpShaderBackgroundLoader = new CShaderBackgroundLoader();
tmpResourceGroupManager.initialiseResourceGroup(m_groupName);
delete tmpShaderBackgroundLoader;
tmpResourceGroupManager.loadResourceGroup(m_groupName, true, true);

I had to alter a few things in the Ogre Source to be able to do this, here are the changes:

OgreScriptCompiler.h addition:

Code: Select all

class _OgreExport SetDefaultParametersGpuProgramScriptCompilerEvent : public ScriptCompilerEvent
{
public:
	GpuProgram* mProgram;
	AbstractNodePtr mParams;
	static String eventType;

SetDefaultParametersGpuProgramScriptCompilerEvent(GpuProgram* program, AbstractNodePtr params)
	:ScriptCompilerEvent(eventType), mProgram(program), mParams(params) {}
};

OgreScriptCompiler.cpp addition:

Code: Select all

String SetDefaultParametersGpuProgramScriptCompilerEvent::eventType = "setDefaultParametersGpuProgram";

GpuProgramTranslator::translateGpuProgram change (so we can choose to not set the default parameters on the shader):

Code: Select all

SetDefaultParametersGpuProgramScriptCompilerEvent evtSDP(prog, params);
if (!compiler->_fireEvent(&evtSDP, 0))
{
	// Set up default parameters
	if (prog->isSupported() && params)
	{
		GpuProgramParametersSharedPtr ptr = prog->getDefaultParameters();
		GpuProgramTranslator::translateProgramParameters(compiler, ptr, static_cast<ObjectAbstractNode*>(params.get()));
	}
}

This means we need to reach GpuProgramTranslator::translateProgramParameters in our code, but that is impossible as it cannot be reached by the user (that class is not exported or something), so I had to alter it as well:

OgreBuiltinScriptTranslators.h GpuProgramTranslator translateProgramParameters removal/commented out, since we cannot reach it from our user-code anyway:

Code: Select all

//static void translateProgramParameters(ScriptCompiler *compiler, GpuProgramParametersSharedPtr params, ObjectAbstractNode *obj);

OgreScriptTranslator.h ScriptTranslator:

Code: Select all

static void translateProgramParameters(ScriptCompiler *compiler, GpuProgramParametersSharedPtr params, ObjectAbstractNode *obj);

I also then replaced "GpuProgramTranslator::translateProgramParameters" with "ScriptTranslator::translateProgramParameters" everywhere in OgreScriptTranslator.cpp.

Here is my user code:
I have a define that I can remove (MY_OWN_THREADED_STUFF) to see the speed of the normal WorkQueue as well.
CShaderBackgroundLoader.cpp

Code: Select all

#define MY_OWN_THREADED_STUFF

// Constructor for the CShaderBackgroundLoader
CShaderBackgroundLoader::CShaderBackgroundLoader()
{
	// Reset the resources count
	m_resourcesCount = 0;

// Set the script compiler listener
ScriptCompilerManager::getSingleton().setListener(this);
}

// Destructor for the CShaderBackgroundLoader
CShaderBackgroundLoader::~CShaderBackgroundLoader()
{
	// Wait until all threaded resources are loaded
	WaitUntilAllCompleted();

// Remove the script compiler listener
ScriptCompilerManager::getSingleton().setListener(NULL);

// Clear the list of resources
#ifdef MY_OWN_THREADED_STUFF
	m_resources.Clear();
#else
	m_resources.clear();
#endif
}

// This gets called when a shader has compiled in the background, and this is always called on the main thread
#ifndef MY_OWN_THREADED_STUFF
void CShaderBackgroundLoader::operationCompleted(Ogre::WorkQueue::RequestID id, const Ogre::BackgroundProcessResult& result)
{
	// Get the current resource
	CContent& tmpContent = m_resources[id];

// Set the error to it (if any)
if (result.error ||
	result.message != "")
	tmpContent.m_errorMessage = result.message == "" ? "Unknown error" : result.message;

if (app->m_GUIManager->m_screen_loadingScreen)
	app->m_GUIManager->m_screen_loadingScreen->SetStartupLoadingPercentage();

// There is no need to handle the error message as it already gets into the log automatically
//if (tmpContent.m_errorMessage != "")
//	CGeneric::ShowMessage(tmpContent.m_resource->getName() + ", compile error: " + tmpContent.m_errorMessage);

// Set the text of the loading screen
if (app->m_GUIManager->m_screen_loadingScreen)
	app->m_GUIManager->m_screen_loadingScreen->AddLogText("Compiled shader: " + tmpContent.m_resource->getName());

// Update loading screen
if (app->m_GUIManager->m_screen_loadingScreen)
	app->m_GUIManager->m_screen_loadingScreen->UpdateLoading();

// This has no effect on speed for some reason when I profile it, it should, but it does not
// Load the resource (this is faster to do here while other threads are running than to call loadResourceGroup at the end)
//if (tmpContent.m_errorMessage != "")
//	tmpContent.m_resource->load();

// Set up default parameters
if (tmpContent.m_resource->isSupported() && tmpContent.m_params)
{
	GpuProgramParametersSharedPtr ptr = tmpContent.m_resource->getDefaultParameters();
	ScriptTranslator::translateProgramParameters(tmpContent.m_compiler, ptr, static_cast<ObjectAbstractNode*>(tmpContent.m_params.get()));
}

// Remove the resource from the list
m_resources.erase(id);
}
#endif

class CResourceManager_ThreadClass
{
public:
	static CList<CShaderBackgroundLoader::CContent>* list;

static void Finish(int index)
{
	CShaderBackgroundLoader::CContent& tmpObj = (*list)[index];

	if (app->m_GUIManager->m_screen_loadingScreen)
		app->m_GUIManager->m_screen_loadingScreen->SetStartupLoadingPercentage();

	// There is no need to handle the error message as it already gets into the log automatically
	//if (content.m_errorMessage != "")
	//	CGeneric::ShowMessage(content.m_resource->getName() + ", compile error: " + content.m_errorMessage);

	// Set the text of the loading screen
	if (app->m_GUIManager->m_screen_loadingScreen)
		app->m_GUIManager->m_screen_loadingScreen->AddLogText("Compiled shader: " + tmpObj.m_resource->getName());
	/*		app->m_GUIManager->m_screen_loadingScreen->AddLogText("Compiled shader (" +
	#ifdef MY_OWN_THREADED_STUFF
						CGeneric::ToString(m_resourcesCount--) + "/" + CGeneric::ToString(int(m_resources.Size())) + "): " + content.m_resource->getName());
	#else
						CGeneric::ToString(int(m_resources.size() - 1)) + "/" + CGeneric::ToString(m_resourcesCount) + "): " + content.m_resource->getName());
	#endif*/

	// Update loading screen
	if (app->m_GUIManager->m_screen_loadingScreen)
		app->m_GUIManager->m_screen_loadingScreen->UpdateLoading();

	// This has no effect on speed for some reason when I profile it, it should, but it does not
	// Load the resource (this is faster to do here while other threads are running than to call loadResourceGroup at the end)
	//if (content.m_errorMessage != "")
	//	content.m_resource->load();

	// Set up default parameters
	if (tmpObj.m_resource->isSupported() && tmpObj.m_params)
	{
		GpuProgramParametersSharedPtr ptr = tmpObj.m_resource->getDefaultParameters();
		ScriptTranslator::translateProgramParameters(tmpObj.m_compiler, ptr, static_cast<ObjectAbstractNode*>(tmpObj.m_params.get()));
	}
}

static void CreateThread(int index, std::list<std::future<void>>& answers)
{
	answers.push_back(std::async(std::launch::async, &DoThreadedAction, index));
}

static void DoThreadedAction(int index)
{
	// Compile the shader
	CShaderBackgroundLoader::CContent& tmpContent = (*list)[index];
	try
	{
		tmpContent.m_resource->prepare(true);
	}
	catch (Exception& e)
	{
		tmpContent.m_errorMessage = e.getFullDescription();
	}
}
};
CList<CShaderBackgroundLoader::CContent>* CResourceManager_ThreadClass::list = NULL;

// Waits until all threads are done
void CShaderBackgroundLoader::WaitUntilAllCompleted()
{
#ifdef MY_OWN_THREADED_STUFF
	// Compile all shaders using threads

// Do the threaded actions until they are done
CResourceManager_ThreadClass::list = &m_resources;
CThreadManager::SetupAndRunUntilDone(12, CResourceManager_ThreadClass::list->Size(), &CResourceManager_ThreadClass::Finish, &CResourceManager_ThreadClass::CreateThread);
CResourceManager_ThreadClass::list = NULL;
#else
	// Start looping
	ResourceBackgroundQueue& tmpResourceBackgroundQueue = ResourceBackgroundQueue::getSingleton();
	while (true)
	{
		// Loop through all resources and check if we have compiled all
		bool tmpAllDone = true;
		for (std::unordered_map<Ogre::WorkQueue::RequestID, CContent>::const_iterator tmpItr = m_resources.begin(); tmpItr != m_resources.end(); ++tmpItr)
		{
			Ogre::WorkQueue::RequestID tmpVar = tmpItr->first;
			if (!tmpResourceBackgroundQueue.isProcessComplete(tmpVar))
			{
				tmpAllDone = false;
				break;
			}
		}
		if (tmpAllDone)
			// Break out of the loop, we are done
			break;

	// Continue compiling the shaders in the background
	Ogre::Root::getSingleton().getWorkQueue()->processResponses();

	// Update loading screen
	//if (app->m_GUIManager->m_screen_loadingScreen)
	//	app->m_GUIManager->m_screen_loadingScreen->UpdateLoading();
}
#endif
}

bool CShaderBackgroundLoader::handleEvent(ScriptCompiler *compiler, ScriptCompilerEvent *_evt, void *retval)
{
	// Check if this event is trying to set default parameters
	if (_evt->mType == "setDefaultParametersGpuProgram")
	{
		// Get the event
		SetDefaultParametersGpuProgramScriptCompilerEvent* tmpEvent = (SetDefaultParametersGpuProgramScriptCompilerEvent*)_evt;

	// Check if the event has a valid program
	if (tmpEvent->mProgram)
	{
		// Compile the shader in a background thread
		Ogre::HighLevelGpuProgramPtr tmpProgram = HighLevelGpuProgramManager::getSingleton().getByName(tmpEvent->mProgram->getName());
#ifdef MY_OWN_THREADED_STUFF
			m_resources.Add(CContent(tmpProgram, tmpEvent->mParams, compiler));
#else
			ResourceBackgroundQueue& tmpResourceBackgroundQueue = ResourceBackgroundQueue::getSingleton();
			//prog->setBackgroundLoaded(true);
			BackgroundProcessTicket tmpTicket = tmpResourceBackgroundQueue.prepare("HighLevelGpuProgram", tmpProgram->getName(), tmpProgram->getGroup(), false, NULL, NULL, this);
			CContent& tmpContent = m_resources[tmpTicket];
			tmpContent.m_resource = tmpProgram;
			tmpContent.m_compiler = compiler;
			tmpContent.m_params = tmpEvent->mParams;
#endif
			m_resourcesCount++;
		}

	// Return that we should skip to set the default parameters.
	// We now have all of its content, so we can set them when we feel like it instead.
	return true;
}

// Return that we should not skip this event
return false;
}

CShaderBackgroundLoader.h

Code: Select all


#define MY_OWN_THREADED_STUFF
class CShaderBackgroundLoader
	: public Ogre::ScriptCompilerListener
#ifndef MY_OWN_THREADED_STUFF
	, Ogre::ResourceBackgroundQueue::Listener
#endif
{
public:
	class CContent
	{
	public:
#ifdef MY_OWN_THREADED_STUFF
		CContent(Ogre::HighLevelGpuProgramPtr resource, AbstractNodePtr params, ScriptCompiler* compiler)
		{
			m_resource = resource;
			m_params = params;
			m_compiler = compiler;

		m_errorMessage = "";
	}
#endif

	~CContent()
	{
		m_resource.reset();
		m_params.reset();
		m_compiler = NULL;
	}

	Ogre::HighLevelGpuProgramPtr m_resource;
	AbstractNodePtr m_params;
	ScriptCompiler* m_compiler;
	CString m_errorMessage;
};

#ifdef MY_OWN_THREADED_STUFF
	CList<CContent> m_resources;
#else
	std::unordered_map<Ogre::WorkQueue::RequestID, CContent> m_resources;
#endif
	int m_resourcesCount;
	CString m_type;

CShaderBackgroundLoader();

~CShaderBackgroundLoader();

#ifndef MY_OWN_THREADED_STUFF
	void operationCompleted(Ogre::WorkQueue::RequestID id, const Ogre::BackgroundProcessResult& result);
#endif

void WaitUntilAllCompleted();

bool handleEvent(ScriptCompiler *compiler, ScriptCompilerEvent *_evt, void *retval);
};

Since the script compiler is only created once (in ScriptCompilerManager::ScriptCompilerManager), and never destroyed until shutting down Ogre (it seems like), we can store it in a pointer.

Its "mGroup" is also only set once, since we are doing this thing for the entire resource group until it is done, which means we can use it however we like as long as we don't create another resource group and try to load/initialize it while we are compiling these shaders.
That means we must load the entire resource group before continuing (which is fine, since we want to compile all shaders before we try to use/render them anyway).

It seems like the params it will not get destroyed when I save it, since it will only get destroyed when I remove the reference from it (at least I think that is correct, since it destroys it in _Decref when I use "m_params.reset()" later on).

Since we use initializeResourceGroup synchronously and set the params synchronously when the shader has been compiled, we never use "params" or "compiler" in a thread outside of the main thread.

Now I am just worried that there is something that I have missed that makes it not thread safe.
Is there something I might have done wrong?
I mean, everything in the game seems to work, but I am still worried of bugs that might appear in the future because of these threaded stuff.

I also wonder why Ogre WorkQueue is so slow compared to my own version of it.

loath
Platinum Sponsor
Platinum Sponsor
Posts: 290
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: Background compile shaders

Post by loath »

Now after a while of coding, I think I found a possible solution, avoiding multiple materials or a special load order and that it supports default parameters in the materials.

with the "first idea" above you can still use default parameters in materials (i.e. in the "vertex_program_ref" or "fragment_program_ref" sections.) just not in the shader "default_params" for the shader declaration. ogre will order your .program creation before the .material creation.

Code: Select all

// fake material illustrating "first idea"

// PLACE THIS IN A .PROGRAM FILE
vertex_program shader/ocean_vp hlsl
{
    source ocean.fx
    entry_point main
    target vs_3_0

    default_params
    {
         // THIS MUST BE BLANK TO AVOID PREPARE() CALLS
    }
}

// PLACE THIS IN A .PROGRAM FILE
fragment_program shader/ocean_fp hlsl
{
    source ocean.fx
    entry_point main
    target ps_3_0

    default_params
    {
         // THIS MUST BE BLANK TO AVOID PREPARE() CALLS
    }
}

// PLACE THIS IN A .MATERIAL FILE
material ocean
{
    technique
    {
        pass
        {
            vertex_program_ref shader/ocean_vp
            {
                // PARAMS HERE ARE OK
                param_named_auto    worldViewProj       worldviewproj_matrix
            }

        fragment_program_ref shader/ocean_fp
        {
            // PARAMS HERE ARE OK
            param_named_auto    lightColor          light_diffuse_colour 0
        }

        texture_unit
        {
            texture water.dds
        }
    }
}

Do I need to use mutex in order to compile shaders? Can't many shaders compile at the same time?

no, you don't need a lock as long as you're not accessing the unordered_map in the worker threads WHILE modifying it from the main thread. the prepare () call itself is already properly synchronized so you can compile the same program twice from different threads and everything will be fine.

My current approach is that I add a new event to the Ogre Source before it tries to set the default parameters, which I use a listener for with ScriptCompilerListener::handleEvent, where I basically copy what I need (the compiler, program and its params), and then return true so it won't continue in the Ogre Source with how it sets the default parameters there (more in detail below).

yes, i noticed this event didn't allow you to control the flow. your change makes sense. (i didn't suggest this route to avoid changes in ogre)

Using no threading: 20.300716
Using Ogre WorkQueue with 6 threads: 20.101999
Using Ogre WorkQueue with 12 threads: 19.973398
Using Ogre WorkQueue with 50 threads: 19.875628
Using my own thread manager with 6 threads: 6.532257
Using my own thread manager with 12 threads: 6.384386
Using my own thread manager with 50 threads: 6.079521

excellent results!

i use my own threads to compile shaders so i have not seen the WorkQueue issue you describe. i do use the work queue background loading everything else from textures, skeletons, meshes, etc which also improves startup time. (although not as dramatically as shader compilation)

I really don't understand how the WorkQueue works, because how can it almost be as slow with 6 threads compared to 50 threads? Is it really only on one thread even if I set it to more?

this is very strange. put a breakpoint on the loadHighLevel () call with the WorkQueue == 6 scenario and see what thread is being used for each call. is it still on the main thread? or an ogre spawned worker thread? back when i implemented my multithreaded shader i basically did this to find every case where i was accidentally compiling on the main thread which causes noticeable frame hiccups.

(i'll look at your code and post any comments in a separate post)

Last edited by loath on Tue May 24, 2022 1:04 am, edited 2 times in total.
loath
Platinum Sponsor
Platinum Sponsor
Posts: 290
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: Background compile shaders

Post by loath »

here is what i meant by a simple c++ approach if you wanted a producer / consumer pattern. if you're just passing the Ogre::ResourcePtr into the worker thread (which is actually what i do in my code) then you obviously don't need the vector or the lock: (typed but not compiled or tested)

Code: Select all

#include <mutex>
#include <thread>
#include <vector>

#include <OGRE/OgreGpuProgram.h>

std::vector<Ogre::GpuProgramPtr> programs_;
std::mutex lock_;
 
void worker (int a)
{
    Ogre::GpuProgramPointer program;
    while (true)
    {
            // scope for lock
            {
                std::lock_guard<std::mutex> guard(lock_);
                if (programs_.empty())
                    return; // close down thread, all work is done
                // get the next program
                program = programs_.back();
                programs_.pop_back();
            }   // lock released
        
// compile program->prepare (); // mutex not held while we compile } } int main() { // fill "programs" vector with all the programs you want to compile. programs_ = GetAllUncompildPrograms ();
// Create and execute the threads std::thread thread1 (worker); std::thread thread2 (worker); std::thread thread3 (worker); thread1.join(); thread2.join(); thread3.join(); // ALL PROGRAMS ARE COMPILED NOW return 0; }
Last edited by loath on Mon May 23, 2022 10:09 pm, edited 2 times in total.
loath
Platinum Sponsor
Platinum Sponsor
Posts: 290
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: Background compile shaders

Post by loath »

  1. personally i would try for an approach that does not require changes to ogre. it's too much work in the future in my opinion when upgrading. if this is acceptable to you, however, then your code looks good to me.

  2. i would use the code under the MY_OWN_THREADED_STUFF. having to poll the Ogre::WorkQueue looks slow even though i highly doubt this is the cause of the threading performance issues. if this container has 100 shaders it's a waste to acquire 100 locks, check the status, and resume.

  3. as for locking issues it's hard for me to tell the exact flow threading-wise from the code here. as long as you are not modifying the unordered_map while reading it inside the worker threads you are good to go. best practice would be to pass only the (integer) index and the Ogre::ResourcePtr into the worker thread and don't access the unordered_map unless you're on the main thread. otherwise the code looks great and you should not have any threading issues.

  4. i would pass in the expected group name here as well. this always comes back to haunt me when i don't do it.

    Code: Select all

    		// Compile the shader in a background thread
    		Ogre::HighLevelGpuProgramPtr tmpProgram = HighLevelGpuProgramManager::getSingleton().
    		          getByName(tmpEvent->mProgram->getName(), PLACE_EXPECTED_GROUP_NAME_HERE);
  5. @paroj, it does look like the background loading flag in Ogre::Resource isn't really used anymore and can be removed or deprecated?

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Background compile shaders

Post by rpgplayerrobin »

Thank you for your answers! :D

with the "first idea" above you can still use default parameters in materials (i.e. in the "vertex_program_ref" or "fragment_program_ref" sections.) just not in the shader "default_params" for the shader declaration. ogre will order your .program creation before the .material creation.

Code: Select all

// fake material illustrating "first idea"

// PLACE THIS IN A .PROGRAM FILE
vertex_program shader/ocean_vp hlsl
{
    source ocean.fx
    entry_point main
    target vs_3_0

    default_params
    {
         // THIS MUST BE BLANK TO AVOID PREPARE() CALLS
    }
}

// PLACE THIS IN A .PROGRAM FILE
fragment_program shader/ocean_fp hlsl
{
    source ocean.fx
    entry_point main
    target ps_3_0

    default_params
    {
         // THIS MUST BE BLANK TO AVOID PREPARE() CALLS
    }
}

// PLACE THIS IN A .MATERIAL FILE
material ocean
{
    technique
    {
        pass
        {
            vertex_program_ref shader/ocean_vp
            {
                // PARAMS HERE ARE OK
                param_named_auto    worldViewProj       worldviewproj_matrix
            }

        fragment_program_ref shader/ocean_fp
        {
            // PARAMS HERE ARE OK
            param_named_auto    lightColor          light_diffuse_colour 0
        }

        texture_unit
        {
            texture water.dds
        }
    }
}

Yeah exactly. But I use those shaders on many materials, and my shaders usually have over 50 different params, which makes it extremly ugly if I had to do it on all materials.
I would have to do material inheritance in that case, but that also makes it an additional material, which is a no-go for me since it is pretty ugly to have to use an additional material per shader instead of zero. Also, I don't really like material inheritance, and it might screw with my code in ways I cannot see right now.

this is very strange. put a breakpoint on the loadHighLevel () call with the WorkQueue == 6 scenario and see what thread is being used for each call. is it still on the main thread? or an ogre spawned worker thread? back when i implemented my multithreaded shader i basically did this to find every case where i was accidentally compiling on the main thread which causes noticeable frame hiccups.

HighLevelGpuProgram::loadHighLevel is called on the main thread, but it does not do anything. It just returns in prepare since it has already been loaded in the background before it comes here.
I put a breakpoint in D3D9HLSLProgram::prepareImpl and it seems to come from a thread:

Code: Select all

RenderSystem_Direct3D9_d.dll!Ogre::D3D9HLSLProgram::prepareImpl() Line 49	C++
OgreMain_d.dll!Ogre::Resource::prepare(bool background) Line 125	C++
OgreMain_d.dll!Ogre::ResourceManager::prepare(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & name, const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & group, bool isManual, Ogre::ManualResourceLoader * loader, const std::map<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > > * loadParams, bool backgroundThread) Line 93	C++
OgreMain_d.dll!Ogre::ResourceBackgroundQueue::handleRequest(const Ogre::WorkQueue::Request * req, const Ogre::WorkQueue * srcQ) Line 343	C++
OgreMain_d.dll!Ogre::DefaultWorkQueueBase::RequestHandlerHolder::handleRequest(const Ogre::WorkQueue::Request * req, const Ogre::WorkQueue * srcQ) Line 538	C++
OgreMain_d.dll!Ogre::DefaultWorkQueueBase::processRequest(Ogre::WorkQueue::Request * r) Line 663	C++
OgreMain_d.dll!Ogre::DefaultWorkQueueBase::processRequestResponse(Ogre::WorkQueue::Request * r, bool synchronous) Line 529	C++
OgreMain_d.dll!Ogre::DefaultWorkQueueBase::_processNextRequest() Line 525	C++
OgreMain_d.dll!Ogre::DefaultWorkQueue::_threadMain() Line 175	C++
OgreMain_d.dll!Ogre::DefaultWorkQueueBase::WorkerFunc::operator()() Line 760	C++
OgreMain_d.dll!std::_Invoker_functor::_Call<Ogre::DefaultWorkQueueBase::WorkerFunc>(Ogre::DefaultWorkQueueBase::WorkerFunc && _Obj)	C++
OgreMain_d.dll!std::invoke<Ogre::DefaultWorkQueueBase::WorkerFunc>(Ogre::DefaultWorkQueueBase::WorkerFunc && _Obj)	C++
OgreMain_d.dll!std::_LaunchPad<std::unique_ptr<std::tuple<Ogre::DefaultWorkQueueBase::WorkerFunc>,std::default_delete<std::tuple<Ogre::DefaultWorkQueueBase::WorkerFunc> > > >::_Execute<0>(std::tuple<Ogre::DefaultWorkQueueBase::WorkerFunc> & _Tup, std::integer_sequence<unsigned __int64,0> __formal) Line 239	C++
OgreMain_d.dll!std::_LaunchPad<std::unique_ptr<std::tuple<Ogre::DefaultWorkQueueBase::WorkerFunc>,std::default_delete<std::tuple<Ogre::DefaultWorkQueueBase::WorkerFunc> > > >::_Run(std::_LaunchPad<std::unique_ptr<std::tuple<Ogre::DefaultWorkQueueBase::WorkerFunc>,std::default_delete<std::tuple<Ogre::DefaultWorkQueueBase::WorkerFunc> > > > * _Ln) Line 245	C++
OgreMain_d.dll!std::_LaunchPad<std::unique_ptr<std::tuple<Ogre::DefaultWorkQueueBase::WorkerFunc>,std::default_delete<std::tuple<Ogre::DefaultWorkQueueBase::WorkerFunc> > > >::_Go() Line 231	C++
	OgreMain_d.dll!std::_Pad::_Call_func(void * _Data) Line 209	C++

having to poll the Ogre::WorkQueue looks slow even though i highly doubt this is the cause of the threading performance issues. if this container has 100 shaders it's a waste to acquire 100 locks, check the status, and resume.

Is the Ogre WorkQueue locking and using mutex even when using "prepare"? Isn't the whole point of "prepare" to never have to use locks and such? Or maybe it is used automatically when I thread that function as well, but I am not sure.

as for locking issues it's hard for me to tell the exact flow threading-wise from the code here. as long as you are not modifying the unordered_map while reading it inside the worker threads you are good to go. best practice would be to pass only the (integer) index and the Ogre::ResourcePtr into the worker thread and don't access the unordered_map unless you're on the main thread.

In WaitUntilAllCompleted (for MY_OWN_THREADED_STUFF), I have a list of resources already fetched, and it never adds/removes resources from that list until the function is done:

Code: Select all

// Do the threaded actions until they are done
CResourceManager_ThreadClass::list = &m_resources;
CThreadManager::SetupAndRunUntilDone(CDefine::GetInt("COMPILE_SHADERS_WANTED_AMOUNT_OF_THREADS"), CResourceManager_ThreadClass::list->Size(), &CResourceManager_ThreadClass::Finish, &CResourceManager_ThreadClass::CreateThread);
CResourceManager_ThreadClass::list = NULL;

The generic code there is so I can use it at other places as well for arbitrary purposes.

Though it is rather sad that I upgraded to a new version of Ogre to get this automatic system, when in fact it did not help me at all and I had to write my own code to do it. :lol:
But I guess the background loading of resource groups helps for other games though, as background loading of resources can be very good of course, but I only needed to fix the shader compliation bottleneck.

loath
Platinum Sponsor
Platinum Sponsor
Posts: 290
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: Background compile shaders

Post by loath »

But I use those shaders on many materials, and my shaders usually have over 50 different params, which makes it extremly ugly if I had to do it on all materials.

would shared parameters help in this case? ha ha last idea.

Is the Ogre WorkQueue locking and using mutex even when using "prepare"?

i looked at the WorkQueue code and there are 3-5 locks used. this is old code and i think steve tried to build a general solution but it wound up being more complicated than necessary.

additionally, prepare() has a spin lock (which is pretty fast) but it's still a waste to loop and "poll" for a multithreaded scenario.

Though it is rather sad that I upgraded to a new version of Ogre to get this automatic system, when in fact it did not help me at all and I had to write my own code to do it.

the benefit of upgrading is the prepare() function now does the compile and is publicly accessible. previously you had to modify ogre to make the LoadHighLevel () function public instead of private. it's so easy to write an efficient producer/consumer in modern c++ (via std::thread or std::futures / async) that there is little benefit to this old creaky WorkQueue code with it's 3-5 locks and 100s of lines of code.

(i don't think the work queue stuff has changed in many years - including this build vs your previous build)

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Background compile shaders

Post by paroj »

there might very well be bugs in the Ogre WorkQueue implementation.
You could help debug this so the next one does not have to by:

  • verify that indeed created 6 threads. The Root constructor clamps the number of threads to 2.
  • set LogManager::getSingleton().setMinLogLevel(LML_TRIVIAL) so the WorkQueue reports what it is currently doing

basically, it should behave just like your own implementation. The issue might be that the std::thread implementation was retro-fit and the original code was written against boost or even TBB..

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Background compile shaders

Post by rpgplayerrobin »

I debugged it now and it creates 6 threads in "DefaultWorkQueue::startup".

I debugged it a bit more and the code that waits for all threads to finish is the problem:

Code: Select all

// Start looping
ResourceBackgroundQueue& tmpResourceBackgroundQueue = ResourceBackgroundQueue::getSingleton();
while (true)
{
	// Loop through all resources and check if we have compiled all
	bool tmpAllDone = true;
	for (std::unordered_map<Ogre::WorkQueue::RequestID, CContent>::const_iterator tmpItr = m_resources.begin(); tmpItr != m_resources.end(); ++tmpItr)
	{
		Ogre::WorkQueue::RequestID tmpVar = tmpItr->first;
		if (!tmpResourceBackgroundQueue.isProcessComplete(tmpVar))
		{
			tmpAllDone = false;
			break;
		}
	}
	if (tmpAllDone)
		// Break out of the loop, we are done
		break;

// Continue compiling the shaders in the background
Ogre::Root::getSingleton().getWorkQueue()->processResponses();
}

Calling processResponses (like loath also mentioned) is very bad.
I thought at first it was meant to be like this, but it seems I need to add a sleep function ("Sleep(10);") in the loop, otherwise processResponses is the actual bottleneck.

When I then instead use 12 threads, it has an extreme difference in speed of course.
Though not really comparable to my own handled threads, since my own written version is somehow 73% faster than the Ogre WorkQueue (4.8 seconds compared to 8.3 seconds startup time in my tests).

I tried to optimize the Ogre WorkQueue by changing the Sleep values and removing the LML_TRIVIAL log, but it had no real effect.
At least I know how it works now, and there is no bug in it, it is just pretty slow (though not as slow as I first thought).

loath
Platinum Sponsor
Platinum Sponsor
Posts: 290
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: Background compile shaders

Post by loath »

you could wait on a std::condition_variable instead of the polling loop. then in Finish () count the number of shaders compiled and when you hit the expected total then signal the std::condition_variable. i'd still recommend just using your own approach.

as it is today, the Work Queue just isn't useful enough for background shader compilation "out of the box". you would need something like your script code-change where ogre wouldn't apply the "default parameters" until after compilation.

in fact, the Work Queue really isn't useful for any of the resources... yes you can background load textures for example. this only loads the data from disk in prepare () but you still need to copy the data to the GPU which causes frame hiccups. (to work around this, i "trickle" the GPU copies over many frames). skeleton data also doesn't really work "out of the box" (although paroj has made some changes and i have not looked into if these solve the issue...). as a result i use the existing "listener" callbacks when loading meshes to stream in the skeletons.

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Background compile shaders

Post by paroj »

hmm.. its a bummer that the Oggre::WorkQueue does not work for any of you. That means that we basically ship dead code/ bloat for everyone.

Maybe we can improve the implementation so you both can use it? At its core it should be something like:
https://github.com/greyfade/workqueue/b ... adpool.hpp

Calling processResponses (like loath also mentioned) is very bad.
I thought at first it was meant to be like this, but it seems I need to add a sleep function ("Sleep(10);") in the loop, otherwise processResponses is the actual bottleneck.

do you mean you have to sleep inside processResponses or in the loop before calling processResponses?

The issue with the WorkQueue might be that it uses heavy-weight locking primitives with the STD implementation. This is due to usages outside of WorkQueue - so we could make WorkQueue faster by being more specific..

loath
Platinum Sponsor
Platinum Sponsor
Posts: 290
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: Background compile shaders

Post by loath »

the important part for me is to load resources without impacting the render thread / framerate.

the "thread pool" aspect is less interesting because it's so easy to do in modern c++ now. (as illustrated by your link).

  1. for example, the shader "background loads" didn't work until you fixed/moved LoadHighLevel () from load() to prepare().
  2. Ogre::Resource::setBackgroundLoaded () does not seem to do anything at least not for me via Ogre::BackgroundProcessTicket code (for non-shader stuff). rpgplayerroblin noted the same thing for his shaders.
  3. Ogre::Skeleton's don't background load because they're tied to meshes. (at least they didn't when i implemented my "background loading")
  4. Ogre::Textures prepare() from file but the blit to GPU still happens on the main thread (at least when i was on dx9).
rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Background compile shaders

Post by rpgplayerrobin »

paroj wrote: Thu May 26, 2022 1:54 pm

Calling processResponses (like loath also mentioned) is very bad.
I thought at first it was meant to be like this, but it seems I need to add a sleep function ("Sleep(10);") in the loop, otherwise processResponses is the actual bottleneck.

do you mean you have to sleep inside processResponses or in the loop before calling processResponses?

Yeah, I use Sleep right before (or after, same thing) I use processResponses in that function, for every loop.

Maybe we can improve the implementation so you both can use it?

I must admit that my knowledge regarding these kind of ways of threading is very limited.
I basically only made one threading function in my game and then use it everywhere for everything (std::future<void> with std::async).
I just create it and then at a later point use "get" to finish it (and in this scenario for shaders I also poll it if it is ready/done each loop after I sleep in the loop).

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Background compile shaders

Post by paroj »

took a closer look at the code. processResponses is meant to be only called once per frame and not inside the waiting loop. So the sleep you added makes sense.
In fact, processResponses will be called automatically by Root::_fireFrameEnded for you.

The discussion here made me evaluate the current WorkQueue design at https://github.com/OGRECave/ogre/issues/2484

In short: it is over-engineered

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Background compile shaders

Post by rpgplayerrobin »

In fact, processResponses will be called automatically by Root::_fireFrameEnded for you.

In that tight loop, I need to finish it before continuing. Of course I could just render instead of calling processResponses in that case though.
The next function after that one is to actually create objects using the shaders, so it needs to be 100% done.

I realized I never posted the function that actually handled the multithreading with the tight loop, so here it is:

.h

Code: Select all

static int SetupAndRunUntilDone(int wantedAmountOfThreads, int listSize, void FinishFunc(int), void CreateThreadFunc(int, std::list<std::future<void>>& answers), int listStartIndex = 0, int numberOfObjectsToCreate = 0);

.cpp

Code: Select all

int YOUR_CLASS_NAME::SetupAndRunUntilDone(int wantedAmountOfThreads, int listSize, void FinishFunc(int), void CreateThreadFunc(int, std::list<std::future<void>>& answers), int listStartIndex, int numberOfObjectsToCreate)
{
	// Setup variables we need
	std::list<std::future<void>> tmpAnswers;
	CList<int> tmpAnswersIndices;
	int tmpFinishedCount = 0;
	int tmpNextIndex = listStartIndex;

if (listStartIndex > listSize)
	return 0;

int tmpNumberOfObjectsToCreate = numberOfObjectsToCreate;
if (tmpNumberOfObjectsToCreate == 0)
	tmpNumberOfObjectsToCreate = listSize;

while (listStartIndex + tmpNumberOfObjectsToCreate > listSize)
	tmpNumberOfObjectsToCreate--;

if (tmpNumberOfObjectsToCreate <= 0)
	return 0;

// Start looping
int tmpObjectsCreated = 0;
while (tmpNumberOfObjectsToCreate > tmpFinishedCount)
{
	// Start looping
	int tmpNumberOfThreadsWorking = 0;
	while (true)
	{
		// Loop through all threads
		bool tmpDeletedSomething = false;
		tmpNumberOfThreadsWorking = 0;
		int tmpItr = 0;
		for (std::list<std::future<void>>::iterator it = tmpAnswers.begin(); it != tmpAnswers.end(); ++it)
		{
			// Check if the current thread is done
			if (it->wait_for(std::chrono::seconds(0)) == std::future_status::ready)
			{
				// Get the results of the thread and finish it
				it->get();

				// Finish the object
				FinishFunc(tmpAnswersIndices[tmpItr]);

				// Add one to the finished count
				tmpFinishedCount++;

				// Set that we have deleted something
				tmpDeletedSomething = true;
				tmpAnswers.erase(it);
				tmpAnswersIndices.Remove(tmpItr);
				break;
			}
			else
				// Add one to the amount of threads working
				tmpNumberOfThreadsWorking++;

			tmpItr++;
		}

		// Check if we did not delete something
		if (!tmpDeletedSomething)
			// Break out of the loop, we are done
			break;
	}

	// Loop until we have created the number of threads we want
	while (tmpNumberOfThreadsWorking < wantedAmountOfThreads)
	{
		// Check if we cannot create another one
		if (tmpNextIndex >= listSize ||
			tmpObjectsCreated >= tmpNumberOfObjectsToCreate)
			// Break out of the loop, we are done
			break;

		// Add a new thread using the next object
		CreateThreadFunc(tmpNextIndex, tmpAnswers);
		tmpAnswersIndices.Add(tmpNextIndex);
		tmpNumberOfThreadsWorking++;
		tmpNextIndex++;
		tmpObjectsCreated++;
	}

	// Sleep a bit
	Sleep(5);
}

// Return how many elements we created
return tmpObjectsCreated;
}

And calling is exactly like I posted before (using CResourceManager_ThreadClass that I posted before to handle the callbacks):

Code: Select all

// Do the threaded actions until they are done
CResourceManager_ThreadClass::list = &m_resources;
CThreadManager::SetupAndRunUntilDone(CDefine::GetInt("COMPILE_SHADERS_WANTED_AMOUNT_OF_THREADS"), CResourceManager_ThreadClass::list->Size(), &CResourceManager_ThreadClass::Finish, &CResourceManager_ThreadClass::CreateThread);
CResourceManager_ThreadClass::list = NULL;
Post Reply