Light Creation/Deletion Performance Bug

Problems building or running the engine, queries about how to use features etc.
rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

Ogre Version: 1.12.13
Operating System: Windows 10
Render System: Direct3D11 and Direct3D9

Hello!

I have been experiencing a bug that causes the FPS of my game to decrease when switching scenes, and it seems to have to do with Ogre.
It happens for both D3D9 and D3D11.

I wish I could debug it further than I have, but the issue only happens in Release, which makes it much harder to pinpoint in the Ogre source.

The closest I have been able to pinpoint the issue is that it has to do with creating and destroying lights and rendering objects that receive lighting using shaders.
Using the code below reproduces the issue when using 2000+ objects in a scene:

Code: Select all

if (evt.keycode == CSDL_Keycode::SDLK_u)
{
	static std::vector<Light*> tmpLights;
	if (tmpLights.size() == 0)
	{
		// Create the lights
		for (int i = 0; i < 300; i++)
		{
			Light* tmpLight = app->m_SceneManager->createLight();
			SceneNode* tmpParent = app->m_SceneManager->getRootSceneNode()->createChildSceneNode();
			tmpParent->attachObject(tmpLight);

		tmpLight->setCastShadows(false);
		tmpLight->setType(Light::LT_POINT);
		tmpLight->setDiffuseColour(ColourValue::White);
		tmpLight->setSpecularColour(ColourValue::White);
		tmpLight->setAttenuation(1000.0f, 0.0f, 0.01f, 0.0f);

		tmpParent->setPosition(Vector3(0.0f, 2.0f, 0.0f));

		tmpLights.push_back(tmpLight);
	}
}
else
{
	// Destroy the lights
	for (int i = 0; i < (int)tmpLights.size(); i++)
	{
		Light* tmpLight = tmpLights[i];

		SceneNode* tmpParent = tmpLight->getParentSceneNode();
		tmpLight->detachFromParent();
		app->m_SceneManager->destroyLight(tmpLight);
		app->m_SceneManager->destroySceneNode(tmpParent);
	}

	// Clear the list of lights
	tmpLights.clear();
}
}

I simply press a button to trigger the code above, which first creates 300 lights and the next press destroys them.
After a few presses the FPS drops dramatically even if there are no more lights in the scene (60 FPS down to 20 FPS in my test scene).

The expected behaviour would be that the FPS would be untouched.

Here is a video showcasing the bug as well:

Note that the bug in the video sometimes also happens much quicker, it seems to be a bit random for some reason.
I can also spam click the button to make the issue appear very quickly.

If I force the shaders to be reloaded by pressing another button, it brings the FPS back up to the normal amount again (but only for D3D11, not for D3D9), but that is a no go since it would be required to be done on all shaders then for each light that I destroy, which is bad (code below).

Code: Select all

std::vector<GpuProgramPtr> tmpGPUPrograms;
ResourceManager::ResourceMapIterator tmpItr = GpuProgramManager::getSingleton().getResourceIterator();
if (tmpItr.begin() == tmpItr.end())
	tmpItr = HighLevelGpuProgramManager::getSingleton().getResourceIterator();
for (ResourceManager::ResourceMapIterator::const_iterator i = tmpItr.begin(); i != tmpItr.end(); ++i)
{
	const ResourcePtr tmpResourcePtr = i->second;

CString tmpName = tmpResourcePtr->getName();
if (tmpName == "HighShader_AlphaRejection_VS" ||
	tmpName == "HighShader_AlphaRejection_PS") // The shaders on the object that is rendered 2000+ times
{
	GpuProgramPtr tmpGpuProgram = GpuProgramManager::getSingleton().getByName(tmpName);
	if (tmpGpuProgram)
		tmpGPUPrograms.push_back(tmpGpuProgram);
}
}

for (int i = 0; i < (int)tmpGPUPrograms.size(); i++)
	tmpGPUPrograms[i]->reload();

For D3D9 if I spam click a button to trigger the code below, it also fixes the issue (but only temporarily, if I start creating/destroying more lights the issue comes back and never goes away again):

Code: Select all

MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_NONE);
MaterialManager::getSingleton().setDefaultAnisotropy(1);
app->m_Root->renderOneFrame(0.0f);
MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_ANISOTROPIC);
MaterialManager::getSingleton().setDefaultAnisotropy(16);

What I think is happening is that the objects are for some reason holding on to the old light list (the 300 lights), making its rendering very slow for some reason.
The objects are not showing the destroyed lights when they are rendered though, so I am probably wrong about my hypothesis. But something is for sure cached incorrectly.
But my hypotheses are impossible to test since this issue does not happen in Debug, there it simply always brings the FPS back up to the expected one when the lights are destroyed, making it impossible for me to debug Ogre.

The 2000+ objects in the scene are all the same object as a test and its material receives light by using these parameters (so it is a bit different from RTSS):

Code: Select all

param_named_auto lightDiffuse0 light_diffuse_colour 0
param_named_auto lightSpecular0 light_specular_colour 0
param_named_auto lightDirection0 light_direction 0

param_named_auto lightDiffuse1 light_diffuse_colour 1
param_named_auto lightSpecular1 light_specular_colour 1
param_named_auto lightAttenuation1 light_attenuation 1
param_named_auto lightPosition1 light_position 1
param_named_auto lightSpotParams1 spotlight_params 1
param_named_auto lightDirection1 light_direction 1

param_named_auto lightDiffuse2 light_diffuse_colour 2
param_named_auto lightSpecular2 light_specular_colour 2
param_named_auto lightAttenuation2 light_attenuation 2
param_named_auto lightPosition2 light_position 2
param_named_auto lightSpotParams2 spotlight_params 2
param_named_auto lightDirection2 light_direction 2

param_named_auto lightDiffuse3 light_diffuse_colour 3
param_named_auto lightSpecular3 light_specular_colour 3
param_named_auto lightAttenuation3 light_attenuation 3
param_named_auto lightPosition3 light_position 3
param_named_auto lightSpotParams3 spotlight_params 3
param_named_auto lightDirection3 light_direction 3

param_named_auto lightDiffuse4 light_diffuse_colour 4
param_named_auto lightSpecular4 light_specular_colour 4
param_named_auto lightAttenuation4 light_attenuation 4
param_named_auto lightPosition4 light_position 4
param_named_auto lightSpotParams4 spotlight_params 4
param_named_auto lightDirection4 light_direction 4

param_named_auto lightDiffuse5 light_diffuse_colour 5
param_named_auto lightSpecular5 light_specular_colour 5
param_named_auto lightAttenuation5 light_attenuation 5
param_named_auto lightPosition5 light_position 5
param_named_auto lightSpotParams5 spotlight_params 5
param_named_auto lightDirection5 light_direction 5

param_named_auto lightDiffuse6 light_diffuse_colour 6
param_named_auto lightSpecular6 light_specular_colour 6
param_named_auto lightAttenuation6 light_attenuation 6
param_named_auto lightPosition6 light_position 6
param_named_auto lightSpotParams6 spotlight_params 6
param_named_auto lightDirection6 light_direction 6

param_named_auto lightDiffuse7 light_diffuse_colour 7
param_named_auto lightSpecular7 light_specular_colour 7
param_named_auto lightAttenuation7 light_attenuation 7
param_named_auto lightPosition7 light_position 7
param_named_auto lightSpotParams7 spotlight_params 7
param_named_auto lightDirection7 light_direction 7

param_named_auto lightDiffuse8 light_diffuse_colour 8
param_named_auto lightSpecular8 light_specular_colour 8
param_named_auto lightAttenuation8 light_attenuation 8
param_named_auto lightPosition8 light_position 8
param_named_auto lightSpotParams8 spotlight_params 8
param_named_auto lightDirection8 light_direction 8

param_named_auto lightDiffuse9 light_diffuse_colour 9
param_named_auto lightSpecular9 light_specular_colour 9
param_named_auto lightAttenuation9 light_attenuation 9
param_named_auto lightPosition9 light_position 9
param_named_auto lightSpotParams9 spotlight_params 9
param_named_auto lightDirection9 light_direction 9

param_named_auto lightDiffuse10 light_diffuse_colour 10
param_named_auto lightSpecular10 light_specular_colour 10
param_named_auto lightAttenuation10 light_attenuation 10
param_named_auto lightPosition10 light_position 10
param_named_auto lightSpotParams10 spotlight_params 10
param_named_auto lightDirection10 light_direction 10

param_named_auto lightDiffuse11 light_diffuse_colour 11
param_named_auto lightSpecular11 light_specular_colour 11
param_named_auto lightAttenuation11 light_attenuation 11
param_named_auto lightPosition11 light_position 11
param_named_auto lightSpotParams11 spotlight_params 11
param_named_auto lightDirection11 light_direction 11

param_named_auto lightDiffuse12 light_diffuse_colour 12
param_named_auto lightSpecular12 light_specular_colour 12
param_named_auto lightAttenuation12 light_attenuation 12
param_named_auto lightPosition12 light_position 12
param_named_auto lightSpotParams12 spotlight_params 12
param_named_auto lightDirection12 light_direction 12

param_named_auto lightDiffuse13 light_diffuse_colour 13
param_named_auto lightSpecular13 light_specular_colour 13
param_named_auto lightAttenuation13 light_attenuation 13
param_named_auto lightPosition13 light_position 13
param_named_auto lightSpotParams13 spotlight_params 13
param_named_auto lightDirection13 light_direction 13

param_named_auto lightDiffuse14 light_diffuse_colour 14
param_named_auto lightSpecular14 light_specular_colour 14
param_named_auto lightAttenuation14 light_attenuation 14
param_named_auto lightPosition14 light_position 14
param_named_auto lightSpotParams14 spotlight_params 14
param_named_auto lightDirection14 light_direction 14

param_named_auto lightDiffuse15 light_diffuse_colour 15
param_named_auto lightSpecular15 light_specular_colour 15
param_named_auto lightAttenuation15 light_attenuation 15
param_named_auto lightPosition15 light_position 15
param_named_auto lightSpotParams15 spotlight_params 15
param_named_auto lightDirection15 light_direction 15

param_named_auto lightDiffuse16 light_diffuse_colour 16
param_named_auto lightSpecular16 light_specular_colour 16
param_named_auto lightAttenuation16 light_attenuation 16
param_named_auto lightPosition16 light_position 16
param_named_auto lightSpotParams16 spotlight_params 16
param_named_auto lightDirection16 light_direction 16

param_named_auto lightDiffuse17 light_diffuse_colour 17
param_named_auto lightSpecular17 light_specular_colour 17
param_named_auto lightAttenuation17 light_attenuation 17
param_named_auto lightPosition17 light_position 17
param_named_auto lightSpotParams17 spotlight_params 17
param_named_auto lightDirection17 light_direction 17

param_named_auto lightDiffuse18 light_diffuse_colour 18
param_named_auto lightSpecular18 light_specular_colour 18
param_named_auto lightAttenuation18 light_attenuation 18
param_named_auto lightPosition18 light_position 18
param_named_auto lightSpotParams18 spotlight_params 18
param_named_auto lightDirection18 light_direction 18

param_named_auto lightDiffuse19 light_diffuse_colour 19
param_named_auto lightSpecular19 light_specular_colour 19
param_named_auto lightAttenuation19 light_attenuation 19
param_named_auto lightPosition19 light_position 19
param_named_auto lightSpotParams19 spotlight_params 19
param_named_auto lightDirection19 light_direction 19

Does anyone know what could cause this issue?
Simply creating and destroying lights should not cause the FPS to drop like this, and since it only happens in Release I am not sure what to do.
Also, there are no errors or exceptions in the Ogre log.

In a normal game scene, this issue is on a lower scale, but simply going between two scenes can decrease FPS by 20%, which is bad. The issue actually transfers over through scenes after everything has been destroyed and created again.

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

This also seem to happen on the latest version of Ogre (v14.3.2 – 25. November 2024).

Video example:

Here is the code (place it in CubeMapping.h):

Code: Select all

#ifndef __CubeMapping_H__
#define __CubeMapping_H__

#include "SdkSample.h"

using namespace Ogre;
using namespace OgreBites;

class _OgreSampleClassExport Sample_CubeMapping : public SdkSample, public RenderTargetListener
{
public:

Sample_CubeMapping()
{
    mInfo["Title"] = "Cube Mapping";
    mInfo["Description"] = "Demonstrates the cube mapping feature where a wrap-around environment is reflected "
        "off of an object. Uses render-to-texture to create dynamic cubemaps.";
    mInfo["Thumbnail"] = "thumb_cubemap.png";
    mInfo["Category"] = "Unsorted";
}

bool frameRenderingQueued(const FrameEvent& evt) override
{
    return SdkSample::frameRenderingQueued(evt);      // don't forget the parent updates!
}

bool mouseReleased(const MouseButtonEvent& evt) override
{
	if (mTrayMgr->mouseReleased(evt))
		return true;
	if (evt.button == BUTTON_LEFT)
		mTrayMgr->showCursor();  // unhide the cursor if user lets go of LMB

	if (evt.button == BUTTON_RIGHT)
	{
		static std::vector<Light*> tmpLights;
		if (tmpLights.size() == 0)
		{
			// Create the lights
			for (int i = 0; i < 300; i++)
			{
				Light* tmpLight = mSceneMgr->createLight();
				SceneNode* tmpParent = mSceneMgr->getRootSceneNode()->createChildSceneNode();
				tmpParent->attachObject(tmpLight);

				tmpLight->setCastShadows(false);
				tmpLight->setType(Light::LT_POINT);
				tmpLight->setDiffuseColour(ColourValue(0.01f, 0.01f, 0.01f));
				tmpLight->setSpecularColour(ColourValue(0.01f, 0.01f, 0.01f));
				tmpLight->setAttenuation(1000000.0f, 0.0f, 0.01f, 0.0f);

				tmpParent->setPosition(Vector3(0.0f, 2.0f, 0.0f));

				tmpLights.push_back(tmpLight);
			}
		}
		else
		{
			// Destroy the lights
			for (int i = 0; i < (int)tmpLights.size(); i++)
			{
				Light* tmpLight = tmpLights[i];

				SceneNode* tmpParent = tmpLight->getParentSceneNode();
				tmpLight->detachFromParent();
				mSceneMgr->destroyLight(tmpLight);
				mSceneMgr->destroySceneNode(tmpParent);
			}

			// Clear the list of lights
			tmpLights.clear();
		}
	}

	return true;
}

protected:

// Helper functions taken from my main project
Vector3 __GetPosition(Camera* obj)
{
	return obj->getDerivedPosition();
}
void __SetPosition(Camera* obj, const Vector3& vec)
{
	Node* tmpNode = obj->getParentNode();
	tmpNode->setPosition(vec);
}
void __LookAt(Camera* obj, const Vector3& pos)
{
	Vector3 tmpPosition = __GetPosition(obj);
	__SetDirection(obj, (pos - tmpPosition).normalisedCopy(), true);
}
Quaternion __GetOrientation(Camera* obj)
{
	return obj->getDerivedOrientation();
}
void __SetDirection(Camera* obj, const Vector3& vec, bool yawFixed)
{
	Quaternion mOrientation = Quaternion::IDENTITY;

	if (vec == Vector3::ZERO) return;

	Vector3 zAdjustVec = -vec;
	zAdjustVec.normalise();

	Quaternion targetWorldOrientation;

	if (yawFixed)
	{
		Vector3 mYawFixedAxis = Vector3::UNIT_Y;

		Vector3 xVec = mYawFixedAxis.crossProduct(zAdjustVec);
		xVec.normalise();

		Vector3 yVec = zAdjustVec.crossProduct(xVec);
		yVec.normalise();

		targetWorldOrientation.FromAxes(xVec, yVec, zAdjustVec);
	}
	else
	{
		Quaternion mRealOrientation = __GetOrientation(obj);

		Vector3 axes[3];
		mRealOrientation.ToAxes(axes);
		Quaternion rotQuat;
		if ((axes[2] + zAdjustVec).squaredLength() < 0.00005f)
		{
			rotQuat.FromAngleAxis(Radian(Math::PI), axes[1]);
		}
		else
		{
			rotQuat = axes[2].getRotationTo(zAdjustVec);
		}
		targetWorldOrientation = rotQuat * mRealOrientation;
	}

	mOrientation = targetWorldOrientation;

	SceneNode* tmpNode = obj->getParentSceneNode();
	tmpNode->_setDerivedOrientation(mOrientation);
	tmpNode->_update(true, false);
}

void setupContent() override
{
	const float tmpSize = 70.0f;
	const float tmpStepSize = 2.0f;
	for (float x = -tmpSize; x < tmpSize; x += tmpStepSize)
	{
		for (float y = -tmpSize; y < tmpSize; y += tmpStepSize)
		{
			Entity* tmpEntity = mSceneMgr->createEntity("foliage_bush0.mesh");
			SceneNode* tmpNode = mSceneMgr->getRootSceneNode()->createChildSceneNode();
			tmpNode->attachObject(tmpEntity);
			tmpNode->setPosition(Vector3(x, 0.0f, y));
		}
	}

	__SetPosition(mCamera, Vector3(100.0f, 100.0f, 100.0f));
	__LookAt(mCamera, Vector3::ZERO);

	mCameraMan->setStyle(CS_MANUAL);
}

void cleanupContent() override
{
}
};

#endif

Here are also all files needed (simply extract it and it will add all files correctly):

When you open Ogre, make sure it has nothing cached (remove C:\Users\YOUR_USERNAME\Documents\OGRE Sample Browser), then start it in Release with Direct3D9 with VSync off.
Then just open the CubeMapping sample through the SampleBrowser and press the right mouse button a couple of times and the FPS will then forever be lower than when it was before you pressed it.

User avatar
sercero
Bronze Sponsor
Bronze Sponsor
Posts: 487
Joined: Sun Jan 18, 2015 4:20 pm
Location: Buenos Aires, Argentina
x 170

Re: Light Creation/Deletion Performance Bug

Post by sercero »

Have you tested this with OpenGL?

Another thing: isn't it possible to debug the issue with RenderDoc?

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

Have you tested this with OpenGL?

That would require writing shaders for it, and just that would take me a long time to do since I am not familiar with it.
But it does happen for D3D9 and D3D11 with my tests as least (in the newest Ogre source version I have only tested it for D3D9 though).

Another thing: isn't it possible to debug the issue with RenderDoc?

I have attempted to do graphics debugging a lot, and the batches count and such are no different at all (which I can see in the Visual Studio graphics debugger).
What I think is happening is that somewhere deep in Ogre code, it just forgets to clear something in the shaders/materials when destroying lights, making the FPS drop.

For example, if I in AutoParamDataSource::getLight just return mBlankLight, this issue never happens at all, so creating the lights and destroying them only has an effect if materials/shaders are actually also receiving them, which leads me to believe it is a shader/material cache issue, but I am still unsure where the issue is in the actual code.

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

Also, the issue can somewhat be fixed by switching the filtering method after the bug has happened even in the Ogre Source version.
But as mentioned in my first post, the issue comes back if you keep creating/deleting the lights again, and then the filtering switching method does nothing to fix the issue again (it only solves the issue once).

This also leads me to believe that the issue has to do with materials and a broken cache somewhere.

Full code (filtering method is now changed by clicking the left mouse button):

Code: Select all

#ifndef __CubeMapping_H__
#define __CubeMapping_H__

#include "SdkSample.h"

using namespace Ogre;
using namespace OgreBites;

class _OgreSampleClassExport Sample_CubeMapping : public SdkSample, public RenderTargetListener
{
public:

Sample_CubeMapping()
{
    mInfo["Title"] = "Cube Mapping";
    mInfo["Description"] = "Demonstrates the cube mapping feature where a wrap-around environment is reflected "
        "off of an object. Uses render-to-texture to create dynamic cubemaps.";
    mInfo["Thumbnail"] = "thumb_cubemap.png";
    mInfo["Category"] = "Unsorted";
}

bool frameRenderingQueued(const FrameEvent& evt) override
{
    return SdkSample::frameRenderingQueued(evt);      // don't forget the parent updates!
}

bool mouseReleased(const MouseButtonEvent& evt) override
{
	if (mTrayMgr->mouseReleased(evt))
		return true;
	if (evt.button == BUTTON_LEFT)
	{
		static bool tmp = false;
		if (!tmp)
		{
			MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_NONE);
			MaterialManager::getSingleton().setDefaultAnisotropy(1);
		}
		else
		{
			MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_ANISOTROPIC);
			MaterialManager::getSingleton().setDefaultAnisotropy(16);
		}
		tmp = !tmp;
	}

	if (evt.button == BUTTON_RIGHT)
	{
		static std::vector<Light*> tmpLights;
		if (tmpLights.size() == 0)
		{
			// Create the lights
			for (int i = 0; i < 300; i++)
			{
				Light* tmpLight = mSceneMgr->createLight();
				SceneNode* tmpParent = mSceneMgr->getRootSceneNode()->createChildSceneNode();
				tmpParent->attachObject(tmpLight);

				tmpLight->setCastShadows(false);
				tmpLight->setType(Light::LT_POINT);
				tmpLight->setDiffuseColour(ColourValue(0.01f, 0.01f, 0.01f));
				tmpLight->setSpecularColour(ColourValue(0.01f, 0.01f, 0.01f));
				tmpLight->setAttenuation(1000000.0f, 0.0f, 0.01f, 0.0f);

				tmpParent->setPosition(Vector3(0.0f, 2.0f, 0.0f));

				tmpLights.push_back(tmpLight);
			}
		}
		else
		{
			// Destroy the lights
			for (int i = 0; i < (int)tmpLights.size(); i++)
			{
				Light* tmpLight = tmpLights[i];

				SceneNode* tmpParent = tmpLight->getParentSceneNode();
				tmpLight->detachFromParent();
				mSceneMgr->destroyLight(tmpLight);
				mSceneMgr->destroySceneNode(tmpParent);
			}

			// Clear the list of lights
			tmpLights.clear();
		}
	}

	return true;
}

protected:

// Helper functions taken from my main project
Vector3 __GetPosition(Camera* obj)
{
	return obj->getDerivedPosition();
}
void __SetPosition(Camera* obj, const Vector3& vec)
{
	Node* tmpNode = obj->getParentNode();
	tmpNode->setPosition(vec);
}
void __LookAt(Camera* obj, const Vector3& pos)
{
	Vector3 tmpPosition = __GetPosition(obj);
	__SetDirection(obj, (pos - tmpPosition).normalisedCopy(), true);
}
Quaternion __GetOrientation(Camera* obj)
{
	return obj->getDerivedOrientation();
}
void __SetDirection(Camera* obj, const Vector3& vec, bool yawFixed)
{
	Quaternion mOrientation = Quaternion::IDENTITY;

	if (vec == Vector3::ZERO) return;

	Vector3 zAdjustVec = -vec;
	zAdjustVec.normalise();

	Quaternion targetWorldOrientation;

	if (yawFixed)
	{
		Vector3 mYawFixedAxis = Vector3::UNIT_Y;

		Vector3 xVec = mYawFixedAxis.crossProduct(zAdjustVec);
		xVec.normalise();

		Vector3 yVec = zAdjustVec.crossProduct(xVec);
		yVec.normalise();

		targetWorldOrientation.FromAxes(xVec, yVec, zAdjustVec);
	}
	else
	{
		Quaternion mRealOrientation = __GetOrientation(obj);

		Vector3 axes[3];
		mRealOrientation.ToAxes(axes);
		Quaternion rotQuat;
		if ((axes[2] + zAdjustVec).squaredLength() < 0.00005f)
		{
			rotQuat.FromAngleAxis(Radian(Math::PI), axes[1]);
		}
		else
		{
			rotQuat = axes[2].getRotationTo(zAdjustVec);
		}
		targetWorldOrientation = rotQuat * mRealOrientation;
	}

	mOrientation = targetWorldOrientation;

	SceneNode* tmpNode = obj->getParentSceneNode();
	tmpNode->_setDerivedOrientation(mOrientation);
	tmpNode->_update(true, false);
}

void setupContent() override
{
	const float tmpSize = 70.0f;
	const float tmpStepSize = 2.0f;
	for (float x = -tmpSize; x < tmpSize; x += tmpStepSize)
	{
		for (float y = -tmpSize; y < tmpSize; y += tmpStepSize)
		{
			Entity* tmpEntity = mSceneMgr->createEntity("foliage_bush0.mesh");
			SceneNode* tmpNode = mSceneMgr->getRootSceneNode()->createChildSceneNode();
			tmpNode->attachObject(tmpEntity);
			tmpNode->setPosition(Vector3(x, 0.0f, y));
		}
	}

	__SetPosition(mCamera, Vector3(100.0f, 100.0f, 100.0f));
	__LookAt(mCamera, Vector3::ZERO);

	mCameraMan->setStyle(CS_MANUAL);

	MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_ANISOTROPIC);
	MaterialManager::getSingleton().setDefaultAnisotropy(16);
}

void cleanupContent() override
{
}
};

#endif
paroj
OGRE Team Member
OGRE Team Member
Posts: 2128
Joined: Sun Mar 30, 2014 2:51 pm
x 1141

Re: Light Creation/Deletion Performance Bug

Post by paroj »

it will take me about 2weeks until I can look at this.

some hints/ debugging tipps:

  • the RTSS shaders only grow the light count in the shader to avoid re-compilation. So if you had 100 lights once, you pay for it even if you go back to 1 light.
  • try a relwithdebinfo build on MSVC. There you get the release stdlib, but still some debug info to step through.
  • if you think its the light list, you can add some printfs to quickly verify
rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

the RTSS shaders only grow the light count in the shader to avoid re-compilation. So if you had 100 lights once, you pay for it even if you go back to 1 light.

The RTSS is not used at all on the objects in the scene. But I guess it could be behind the scenes somewhere or on the UI.
But, my scene in my game is not using RTSS at all, and it has the exact same FPS bug on the same scene.
Even if it was RTSS, it does not explain why the FPS goes back up by switching the filtering method.

try a relwithdebinfo build on MSVC. There you get the release stdlib, but still some debug info to step through.

The same issue does happen here. That means that it will be easier to debug for sure.
I will debug to see if I can find the issue more in detail.

if you think its the light list, you can add some printfs to quickly verify

I did this in:
MovableObject::queryLights
SceneManager::updateCachedLightInfos
SceneManager::findLightsAffectingFrustum
AutoParamDataSource::setCurrentLightList
But the result is either 0 or 300 lights, which seem to be correct.

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

I made the camera come closer to the objects, which made the FPS difference even stronger (800 FPS down to 160 FPS when the bug happens).

I altered the shaders to use D3D11 instead and the bug also exists in D3D11 in the newest version of Ogre.
I mostly did this so I could debug using RenderDoc and Nsight, however, I did not manage to find any differences at all (compared before and after the bug had occured).
Though it might be that their timers need an API to work more accurately, but I don't think that is possible to just activate with Ogre just like that?

I also updated my graphics card driver (and of course cleared the shader cache) and it made no difference.
I also tried this bug on another computer with another graphics card, and the exact same bug happens there.

Also, the number of lights created each click does not matter much, even 3000 lights created and destroyed has the exact same performance as only creating and destroying 300 each click.
That means that it somehow reaches a "maximum value", which seems to be limited by the amount of lights in the shader. If I have 20 lights in the shader, the bug makes the FPS much lower than if the shader only had 2 lights.

I also realized that RTSS was actually compiling some shaders in the background when I changed the light amount, so I made sure that SGScheme::synchronizeWithLightSettings just returned to disable that behaviour completely.
It did not fix anything regarding the bug though.

I made sure to use "shrink_to_fit" on mLightList (per object) and mLightsAffectingFrustum to make them actually also get a capacity of 0 instead of 300 when the lights are destroyed, but that had no effect on FPS at all.

I tried to attach the 300 lights to a single scene node instead, but that had no effect on the bug.

But for D3D11, if the shader is reloaded, the bug fixes itself unless you start creating/destroying lights again. But reloading a shader like that is very slow and should not happen each time a light is destroyed.
I added the shader reload code to the middle mouse button in the sample code further down as well.

In another test, I no longer created any lights, instead I generated a random light in AutoParamDataSource::getLight (or for AutoParamDataSource::getLightDiffuseColour + getLightSpecularColour):

Code: Select all

#include "windows.h"
bool IsCapslockToggled()
{
	// Return whether or not capslock is toggled
	return (GetKeyState(VK_CAPITAL) & 0x0001) != 0;
}
const Light& AutoParamDataSource::getLight(size_t index) const
{
	if (IsCapslockToggled())
	{
		static Light newLight;
		static SceneNode newNode(NULL);
		if (!newLight.isAttached())
		{
			newNode.attachObject(&newLight);
			newLight.setAttenuation(0, 1, 0, 0);
		}

	newLight.setDiffuseColour(ColourValue(Math::RangeRandom(0.0f, 0.1f), Math::RangeRandom(0.0f, 0.1f), Math::RangeRandom(0.0f, 0.1f), 1.0f));
	newLight.setSpecularColour(ColourValue(Math::RangeRandom(0.0f, 0.1f), Math::RangeRandom(0.0f, 0.1f), Math::RangeRandom(0.0f, 0.1f), 1.0f));

	return newLight;
}

// If outside light range, return a blank light to ensure zeroised for program
if (mCurrentLightList && index < mCurrentLightList->size())
{
	return *((*mCurrentLightList)[index]);
}
else
{
	return mBlankLight;
}        
}

The same FPS bug also showed up here (which also lasts between scenes) if you just toggle capslock for a second or two, but the FPS was not stuck as low as when creating actual lights for some reason.
But this at least shows that actual lights are not what is causing this bug, instead it has to do with setting them to the shader for some reason.
But the entire scene should be cleaned when pressing Stop Sample, which makes this issue so strange since it somehow persists through all that (but the shader is not unloaded then of course, so that is most likely the issue).

I realized the shader did not use all of its variables from the lights, so I made sure that all shader parameters and their xyzw were used to not have them compiled away, and that made the bug just even stronger than before (even lower FPS).
Here are the resources needed, and it now instead uses D3D11 for its shaders:
https://drive.google.com/drive/folders/ ... drive_link

And here is the new CubeMapping.h file:

Code: Select all

#ifndef __CubeMapping_H__
#define __CubeMapping_H__

#include "SdkSample.h"

using namespace Ogre;
using namespace OgreBites;

class _OgreSampleClassExport Sample_CubeMapping : public SdkSample, public RenderTargetListener
{
public:

Sample_CubeMapping()
{
    mInfo["Title"] = "Cube Mapping";
    mInfo["Description"] = "Demonstrates the cube mapping feature where a wrap-around environment is reflected "
        "off of an object. Uses render-to-texture to create dynamic cubemaps.";
    mInfo["Thumbnail"] = "thumb_cubemap.png";
    mInfo["Category"] = "Unsorted";
}

bool frameRenderingQueued(const FrameEvent& evt) override
{
    return SdkSample::frameRenderingQueued(evt);      // don't forget the parent updates!
}

std::vector<Light*> tmpLights;
std::vector<Entity*> tmpEntities;
std::vector<SceneNode*> tmpSceneNodes;

bool mouseReleased(const MouseButtonEvent& evt) override
{
	if (mTrayMgr->mouseReleased(evt))
		return true;
	if (evt.button == BUTTON_MIDDLE)
	{
		/*if (tmpEntities.size() != 0)
		{
			Entity* tmpEntity = tmpEntities[0];
			MaterialPtr tmpMaterial = tmpEntity->getSubEntity(0)->getMaterial();
			if (tmpMaterial)
			{
				for (size_t t = 0; t < tmpMaterial->getNumTechniques(); t++)
				{
					Technique* tmpTechnique = tmpMaterial->getTechnique(t);
					for (size_t p = 0; p < tmpTechnique->getNumPasses(); p++)
					{
						Pass* tmpPass = tmpTechnique->getPass(p);
						tmpPass->_recalculateHash();
					}
				}
			}
		}*/



		/*mSceneMgr->destroyAllAnimations();

		if (mSceneMgr->mRenderQueue)
			mSceneMgr->mRenderQueue->clear(true);

		mSceneMgr->mAutoParamDataSource.reset(mSceneMgr->createAutoParamDataSource());*/



		/*static bool tmp = false;
		if (!tmp)
		{
			MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_NONE);
			LogManager::getSingleton().logMessage("ROBINSWITCH: Switched filtering to TFO_NONE");
			//MaterialManager::getSingleton().setDefaultAnisotropy(1);
		}
		else
		{
			MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_ANISOTROPIC);
			LogManager::getSingleton().logMessage("ROBINSWITCH: Switched filtering to TFO_ANISOTROPIC");
			//MaterialManager::getSingleton().setDefaultAnisotropy(16);
		}
		tmp = !tmp;*/





		std::vector<GpuProgramPtr> tmpGPUPrograms;
		ResourceManager::ResourceMapIterator tmpItr = GpuProgramManager::getSingleton().getResourceIterator();
		if (tmpItr.begin() == tmpItr.end())
			tmpItr = HighLevelGpuProgramManager::getSingleton().getResourceIterator();
		for (ResourceManager::ResourceMapIterator::const_iterator i = tmpItr.begin(); i != tmpItr.end(); ++i)
		{
			const ResourcePtr tmpResourcePtr = i->second;

			const Ogre::String& tmpName = tmpResourcePtr->getName();
			if (/*tmpName == "Test_AlphaRejection_VS" ||*/
				tmpName == "Test_AlphaRejection_PS")
			{
				GpuProgramPtr tmpGpuProgram = GpuProgramManager::getSingleton().getByName(tmpName);
				if (tmpGpuProgram)
					tmpGPUPrograms.push_back(tmpGpuProgram);
			}
		}

		for (size_t i = 0; i < tmpGPUPrograms.size(); i++)
			tmpGPUPrograms[i]->reload();
	}

	if (evt.button == BUTTON_RIGHT)
	{
		if (tmpLights.size() == 0)
		{
			LogManager::getSingleton().logMessage("ROBINSWITCH: Created lights");
			SceneNode* tmpParent = mSceneMgr->getRootSceneNode()->createChildSceneNode();
			tmpParent->setPosition(Vector3(0.0f, 2.0f, 0.0f));
			// Create the lights
			for (int i = 0; i < 300; i++)
			{
				Light* tmpLight = mSceneMgr->createLight();
				tmpParent->attachObject(tmpLight);

				tmpLight->setCastShadows(false);
				tmpLight->setType(Light::LT_POINT);
				tmpLight->setDiffuseColour(ColourValue(0.01f, 0.01f, 0.01f));
				tmpLight->setSpecularColour(ColourValue(0.01f, 0.01f, 0.01f));
				tmpLight->setAttenuation(1000000.0f, 0.0f, 0.01f, 0.0f);

				tmpLights.push_back(tmpLight);
			}
		}
		else
		{
			LogManager::getSingleton().logMessage("ROBINSWITCH: Destroyed lights");
			SceneNode* tmpParent = tmpLights[0]->getParentSceneNode();
			// Destroy the lights
			for (int i = 0; i < (int)tmpLights.size(); i++)
			{
				Light* tmpLight = tmpLights[i];

				tmpLight->detachFromParent();
				mSceneMgr->destroyLight(tmpLight);
			}
			mSceneMgr->destroySceneNode(tmpParent);

			// Clear the list of lights
			tmpLights.clear();
		}
	}

	return true;
}

protected:

// Helper functions taken from my main project
Vector3 __GetPosition(Camera* obj)
{
	return obj->getDerivedPosition();
}
void __SetPosition(Camera* obj, const Vector3& vec)
{
	Node* tmpNode = obj->getParentNode();
	tmpNode->setPosition(vec);
}
void __LookAt(Camera* obj, const Vector3& pos)
{
	Vector3 tmpPosition = __GetPosition(obj);
	__SetDirection(obj, (pos - tmpPosition).normalisedCopy(), true);
}
Quaternion __GetOrientation(Camera* obj)
{
	return obj->getDerivedOrientation();
}
void __SetDirection(Camera* obj, const Vector3& vec, bool yawFixed)
{
	Quaternion mOrientation = Quaternion::IDENTITY;

	if (vec == Vector3::ZERO) return;

	Vector3 zAdjustVec = -vec;
	zAdjustVec.normalise();

	Quaternion targetWorldOrientation;

	if (yawFixed)
	{
		Vector3 mYawFixedAxis = Vector3::UNIT_Y;

		Vector3 xVec = mYawFixedAxis.crossProduct(zAdjustVec);
		xVec.normalise();

		Vector3 yVec = zAdjustVec.crossProduct(xVec);
		yVec.normalise();

		targetWorldOrientation.FromAxes(xVec, yVec, zAdjustVec);
	}
	else
	{
		Quaternion mRealOrientation = __GetOrientation(obj);

		Vector3 axes[3];
		mRealOrientation.ToAxes(axes);
		Quaternion rotQuat;
		if ((axes[2] + zAdjustVec).squaredLength() < 0.00005f)
		{
			rotQuat.FromAngleAxis(Radian(Math::PI), axes[1]);
		}
		else
		{
			rotQuat = axes[2].getRotationTo(zAdjustVec);
		}
		targetWorldOrientation = rotQuat * mRealOrientation;
	}

	mOrientation = targetWorldOrientation;

	SceneNode* tmpNode = obj->getParentSceneNode();
	tmpNode->_setDerivedOrientation(mOrientation);
	tmpNode->_update(true, false);
}

void setupContent() override
{
	const float tmpSize = 70.0f;
	const float tmpStepSize = 2.0f;
	for (float x = -tmpSize; x < tmpSize; x += tmpStepSize)
	{
		for (float y = -tmpSize; y < tmpSize; y += tmpStepSize)
		{
			Entity* tmpEntity = mSceneMgr->createEntity("foliage_bush0.mesh");
			SceneNode* tmpNode = mSceneMgr->getRootSceneNode()->createChildSceneNode();
			tmpNode->attachObject(tmpEntity);
			tmpNode->setPosition(Vector3(x, 0.0f, y));

			tmpEntities.push_back(tmpEntity);
			tmpSceneNodes.push_back(tmpNode);
		}
	}

	float tmpDistance = 2.0f;
	__SetPosition(mCamera, Vector3(tmpDistance, tmpDistance, tmpDistance));
	__LookAt(mCamera, Vector3::ZERO);
	mCamera->setNearClipDistance(0.01f);
	mCamera->setFarClipDistance(1000.0f);

	mCameraMan->setStyle(CS_MANUAL);

	MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_ANISOTROPIC);
	MaterialManager::getSingleton().setDefaultAnisotropy(16);
}

void cleanupContent() override
{
	if (tmpLights.size() != 0)
	{
		SceneNode* tmpParent = tmpLights[0]->getParentSceneNode();
		// Destroy the lights
		for (int i = 0; i < (int)tmpLights.size(); i++)
		{
			Light* tmpLight = tmpLights[i];

			tmpLight->detachFromParent();
			mSceneMgr->destroyLight(tmpLight);
		}
		mSceneMgr->destroySceneNode(tmpParent);

		// Clear the list of lights
		tmpLights.clear();
	}

	for (size_t i = 0; i < tmpEntities.size(); i++)
	{
		tmpEntities[i]->detachFromParent();
		mSceneMgr->destroyEntity(tmpEntities[i]);
	}
	tmpEntities.clear();

	for (size_t i = 0; i < tmpSceneNodes.size(); i++)
	{
		mSceneMgr->destroySceneNode(tmpSceneNodes[i]);
	}
	tmpSceneNodes.clear();
}
};

#endif

I am out of ideas for now though...

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

Update:

I made a test to remove all calculations from the shader, simply only using all light variables like "color.x += lightDiffuse0.x * 0.00001;" etc.
Even with that code, the bug still exists, so it has nothing to do with invalid lighting calculations in any way in the shader.

I also tried to instead use arrays (such as light_diffuse_colour_array) for all lights instead of having 20 different variables for 20 lights, but it had no impact on performance and no impact on the bug.

I also went through debugging the graphics more in detail and I made sure that everything was 100% exactly the same when rendering, even if one snapshot has 800 FPS and the other has 160 FPS.
They used exactly the same buffers for it, even the huge constant buffer was exactly the same (used a diff tool to see that there were no differences between them).
I did the same thing for the entire callstack of all D3D11 function, and both were exactly the same.

I noticed something else though. If I have 300 lights out and then reload the shader, the FPS goes to its maximum value (as expected), but if I then remove the lights the FPS goes down to the minimum amount again. That means that it is not that the shader is getting slower with more lights, it is instead that it gets slower by having its shader parameters changed from its first frame, for some reason.

I digged a bit more and I thought that the performance issue might be when setting the constants to the shader (in D3D11RenderSystem::bindGpuProgramParameters), but I saw no performance difference at all even if I force the constants there (only for my shader) to not lock/unlock and write to update the shader.
And in the same function I added code to create a predefined buffer and to switch to use that for the fragment shader when Capslock is toggled, and that showed that the bug occurs as soon the data is changed.

The interesting thing here is that if that predefined buffer has the same content as the current shader buffer, and I switch between them, the performance is not affected at all, even if they are actually two different D3D11 buffers behind the scenes.
But as soon as the predefined buffer is different from the current shader buffer, and you switch to use it, the performance issue appears.

So here are some scenarios of this bug (with 4000 models with the same shader and material):
1.
The scene has 300 lights. The FPS is 600.
I remove the lights, and the FPS falls to 160.
I reload the shader by pressing a button, the FPS goes back up to 600.
If I add 300 lights, the FPS falls to 160 again.

2 (opposite of 1).
The scene has 0 lights. The FPS is 600.
I add 300 lights, and the FPS falls to 160.
I reload the shader by pressing a button, the FPS goes back up to 600.
If I remove the 300 lights, the FPS falls to 160 again.

3 (predefined buffer).
The scene has 300 lights. The FPS is 600.
I toggle Capslock to create the predefined buffer from the current constants buffer. The performance is unaffected.
I untoggle Capslock to keep using the standard buffer. The performance is unaffected.
I remove the lights. The FPS falls to 160.
I toggle Capslock to use the predefined buffer where there were lights, but the FPS is still stuck at 160.
I untoggle Capslock to keep using the standard buffer.
I reload the shader by pressing a button, the FPS goes back up to 600.
I toggle Capslock to use the predefined buffer where there were lights. The FPS falls to 160.

This means that the actual shader code and constant buffer data has no actual effect when it comes to the performance, because simply reloading the shader brings it back up to maximum FPS again even with the new constant buffer data.

This means it has to do with deep D3D11/D3D9 specific code.

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

I think I have found the problem, but I am not sure how to solve it.
It has to do with how constants are updated to the shader.

If I for just one frame temporarily use a staging buffer for each object (or just the shader that has the issue) and copy its content over using CopyResource (for D3D11), it solves the performance/FPS bug, even if the staging buffer is no longer being used after that frame.
It is like hitting a refresh button.
So for some reason it fixes the problem for the shader, like it was some bad cache that needed cleaning, which was then cleaned when using the staging buffer temporarily, for some unknown reason.

However, if I continue changing the constant buffer (by for example creating/destroying lights), the same performance bug re-appears again, which means that I have to use the staging buffer for a frame again to fix it again.

To test it yourself, go into D3D11RenderSystem::bindGpuProgramParameters in OgreD3D11RenderSystem.cpp and find the below code:

Code: Select all

auto& cbuffer = updateDefaultUniformBuffer(gptype, params->getConstantList());
buffers[0] = static_cast<D3D11HardwareBuffer*>(cbuffer.get())->getD3DBuffer();

Then replace it with this:

Code: Select all

auto& cbuffer = updateDefaultUniformBuffer(gptype, params->getConstantList());

if (IsCapslockToggled())
{
	// Create a new staging buffer and copy the data to that, then copy the resource to our ubo

// (This code creates one per frame, which is most likely very bad, but as a test it works)

ComPtr<ID3D11Buffer> d3d11Buffer;
D3D11_BUFFER_DESC stagingDesc = {};
//stagingDesc.ByteWidth = static_cast<UINT>(params->getConstantList().size());
stagingDesc.ByteWidth = static_cast<UINT>(cbuffer->getSizeInBytes()); // We need the exact same size as of the buffer for CopyResource to work
stagingDesc.Usage = D3D11_USAGE_STAGING;
stagingDesc.BindFlags = 0;
stagingDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
mDevice->CreateBuffer(&stagingDesc, nullptr, d3d11Buffer.ReleaseAndGetAddressOf());

D3D11_MAPPED_SUBRESOURCE mappedResource = {};
HRESULT hr = mDevice.GetImmediateContext()->Map(d3d11Buffer.Get(), 0, D3D11_MAP_WRITE, 0, &mappedResource);
if (FAILED(hr))
	MessageBox(NULL, "Error", "Error", MB_OK | MB_TOPMOST | MB_SYSTEMMODAL);
memset(mappedResource.pData, '\0', params->getConstantList().size());
memcpy(mappedResource.pData, params->getConstantList().data(), params->getConstantList().size());
mDevice.GetImmediateContext()->Unmap(d3d11Buffer.Get(), 0);

mDevice.GetImmediateContext()->CopyResource(static_cast<D3D11HardwareBuffer*>(cbuffer.get())->getD3DBuffer(), d3d11Buffer.Get());
}

buffers[0] = static_cast<D3D11HardwareBuffer*>(cbuffer.get())->getD3DBuffer();

The code below is defined above the function to be able to use capslock:

Code: Select all

#include "windows.h"
bool IsCapslockToggled()
{
	return (GetKeyState(VK_CAPITAL) & 0x0001) != 0;
}

To test it, create and destroy a lot of lights (with my previous code) by pressing the right mouse button a couple of times, see that the FPS is now low (150 instead of 800), then toggle Capslock on and off and see the FPS go back up to 800.

When I tried to always use a staging buffer (with better code than the one above of course), the FPS was terrible, so always using a staging buffer seems to be a no-go.

This seems like an issue that can only be solved by someone that knows more in depth about rendering systems.
And again, this happens on two different computers with two different graphic cards (but both Nvidia though), so it is not a minor issue that only happens for my computer.
Even if this gets fixed on D3D11, the issue still exists on D3D9 as well. I have not tried it for OpenGL though, so it might exist there as well.


Other than that, I also tried other possible solutions.

There is a list of all different shader types in RenderSystem::updateDefaultUniformBuffer, like fragment, vertex, compute, etc.
Each one of those types, say fragment, is only created once for its buffer (HardwareBufferPtr), and not once per object or material.
If it detects a larger size on a new shader it simply recreates the buffer with the largest size.

Instead of using it only once per shader type, I changed the code to create one of them for each render/batch in that frame.
This did make the performance on a stress-test scene a bit worse though (53 FPS down to 50 FPS in Release), and it has no effect on the bug at all, so this bug has nothing to do with just using one constant buffer per shader type.

I also tried to triple buffer the constant buffer for each object, but it did not solve the bug in any way.
I also tried to triple buffer the staging buffer, but that did not work either.

I have also tried to change the compile flags to skip validation or optimization, and also enabled the debug layer, but that did not help at all.
I could not find anything in Nsight that was useful after enabling the debug layer either.

User avatar
sercero
Bronze Sponsor
Bronze Sponsor
Posts: 487
Joined: Sun Jan 18, 2015 4:20 pm
Location: Buenos Aires, Argentina
x 170

Re: Light Creation/Deletion Performance Bug

Post by sercero »

Do you have a package with the .exe and the assets that you used to test this?

I can try it on an AMD card...

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

Nice!

Here is a zip file containing everything, simply start the Start.bat file in the base directory after you have extracted it:
https://drive.google.com/file/d/1D-5MnI ... sp=sharing

When you start it, if the Ogre dialog does not come up, exit and delete the directory at C:\YOUR_USERNAME\rwn00\Documents\OGRE Sample Browser.

When the Ogre Dialog comes up, select to use Direct3D11, VSync Off and 1920x1080.
Go into the CubeMapping sample (sample 12/54).

Note the FPS. In that scene (using a lot of scaled ogre heads with a ninja texture on them) I have around 280 FPS.
Toggle on Numlock to see the models become red. Notice the FPS, I get around 190.
Toggle off Numlock to see the models normally again. Notice that the FPS is still 190 even if the previous constants have been set again.
Toggle on Caplock and instantly toggle it off again. Notice that the FPS comes back to 280.

Some scenes are worse than this. For some meshes I get 800 FPS before the bug and 150 FPS after the bug, but that is a bit craving on the GPU so I chose to use the ogrehead mesh instead for a bit lower FPS.

What is actually happening in the code is that I am using SetNamedConstant on all constants in the shader for the material, every frame, to either "ColourValue::Black" or "ColourValue(0.5f, 0.5f, 0.5f, 1.0f)" when numlock is active (which turns them red because of how the shader works).
So in short, lights are not even needed to reproduce this bug.

When you have capslock toggled, my hacky fix is done from my previous post, which fixes the FPS bug (at least temporarily, until you start using numlock again).

If you press the middle mouse button, the fragment shader is recompiled, which also fixes the FPS bug (at least temporarily, until you start using numlock again).

You can also toggle on Numlock and see the FPS go down to 190 and then use capslock or the middle mouse button to fix the FPS bug as well, which shows that making the models/shader look red is not what is causing the FPS drop.

User avatar
sercero
Bronze Sponsor
Bronze Sponsor
Posts: 487
Joined: Sun Jan 18, 2015 4:20 pm
Location: Buenos Aires, Argentina
x 170

Re: Light Creation/Deletion Performance Bug

Post by sercero »

Note the FPS. In that scene (using a lot of scaled ogre heads with a ninja texture on them) I have around 280 FPS.
Toggle on Numlock to see the models become red. Notice the FPS, I get around 190.
Toggle off Numlock to see the models normally again. Notice that the FPS is still 190 even if the previous constants have been set again.
Toggle on Caplock and instantly toggle it off again. Notice that the FPS comes back to 280.

Results:

  • Toggling on/off Numlock does barely anything (FPS varies between 154 and 152)
    Toggling on/off CapsLock changes the FPS from 150 to 115

The video card is just an integrated GPU in the processor:
Video Chipset: AMD Radeon Vega
Video Chipset Codename: Picasso
Video Memory: 2048 MBytes of DDR4 SDRAM
Driver Manufacturer: Advanced Micro Devices, Inc.
Driver Description: AMD Radeon(TM) RX Vega 11 Graphics
Driver Version: 30.0.15021.11005

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

Toggling on/off Numlock does barely anything (FPS varies between 154 and 152)

I guess the start FPS was not changed at all compared to when you first used Numlock?
Because if I have numlock active on startup the bug will have already happened, which makes the numlock toggling to have no effect on FPS at all.

Also, I guess numlock at least changed the color of the models to red?

Toggling on/off CapsLock changes the FPS from 150 to 115

Having capslock on will decrease FPS, it was only meant as an instruction to toggle it on and then instantly off, in order for the shader to reset its internal cache, and then to measure the new FPS after it is off.

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

I have now created the smallest amount of code to reproduce the issue without the need of any external resources.
It creates the shader and materials in code now, and therefore only needs the code below to replace the code in CubeMapping.h (Samples\Simple\include\CubeMapping.h):

Code: Select all

#ifndef __CubeMapping_H__
#define __CubeMapping_H__

#include "SdkSample.h"
#include "windows.h"

using namespace Ogre;
using namespace OgreBites;

class _OgreSampleClassExport Sample_CubeMapping : public SdkSample, public RenderTargetListener
{
public:

Sample_CubeMapping()
{
	mInfo["Title"] = "Cube Mapping";
	mInfo["Description"] = "Demonstrates the cube mapping feature where a wrap-around environment is reflected "
		"off of an object. Uses render-to-texture to create dynamic cubemaps.";
	mInfo["Thumbnail"] = "thumb_cubemap.png";
	mInfo["Category"] = "Unsorted";
}

std::vector<Entity*> tmpEntities;
std::vector<SceneNode*> tmpSceneNodes;

bool IsNumlockToggled()
{
	return (GetKeyState(VK_NUMLOCK) & 0x0001) != 0;
}

bool frameRenderingQueued(const FrameEvent& evt) override
{
	bool r = SdkSample::frameRenderingQueued(evt);

	ColourValue col = ColourValue::Black;
	if (IsNumlockToggled())
		col = ColourValue(0.5f, 0.5f, 0.5f, 1.0f);

	if (tmpEntities.size() != 0)
	{
		Entity* e = tmpEntities[0];
		MaterialPtr m = e->getSubEntity(0)->getMaterial();

		GpuProgramParametersSharedPtr gpup = m->getTechnique(0)->getPass(0)->getFragmentProgramParameters();
		for (int i = 0; i <= 19; i++)
		{
			std::string is = std::to_string(i);

			gpup->setNamedConstant("lightDiffuse" + is, col);
			gpup->setNamedConstant("lightSpecular" + is, col);
			gpup->setNamedConstant("lightAttenuation" + is, col);
			gpup->setNamedConstant("lightPosition" + is, col);
			gpup->setNamedConstant("lightDirection" + is, col);
			gpup->setNamedConstant("lightSpotParams" + is, col);
		}
	}

	return r;
}

bool mouseReleased(const MouseButtonEvent& evt) override
{
	if (mTrayMgr->mouseReleased(evt))
		return true;

	if (evt.button == BUTTON_MIDDLE)
	{
		std::vector<GpuProgramPtr> tmpGPUPrograms;
		ResourceManager::ResourceMapIterator tmpItr = GpuProgramManager::getSingleton().getResourceIterator();
		if (tmpItr.begin() == tmpItr.end())
			tmpItr = HighLevelGpuProgramManager::getSingleton().getResourceIterator();
		for (ResourceManager::ResourceMapIterator::const_iterator i = tmpItr.begin(); i != tmpItr.end(); ++i)
		{
			const ResourcePtr tmpResourcePtr = i->second;

			const Ogre::String& tmpName = tmpResourcePtr->getName();
			if (tmpName == "TestBug_PS")
			{
				GpuProgramPtr tmpGpuProgram = GpuProgramManager::getSingleton().getByName(tmpName);
				if (tmpGpuProgram)
					tmpGPUPrograms.push_back(tmpGpuProgram);
			}
		}

		for (size_t i = 0; i < tmpGPUPrograms.size(); i++)
			tmpGPUPrograms[i]->reload();
	}

	return true;
}

protected:

// Helper functions
Vector3 __GetPosition(Camera* obj)
{
	return obj->getDerivedPosition();
}
void __SetPosition(Camera* obj, const Vector3& vec)
{
	Node* tmpNode = obj->getParentNode();
	tmpNode->setPosition(vec);
}
void __LookAt(Camera* obj, const Vector3& pos)
{
	Vector3 tmpPosition = __GetPosition(obj);
	__SetDirection(obj, (pos - tmpPosition).normalisedCopy(), true);
}
Quaternion __GetOrientation(Camera* obj)
{
	return obj->getDerivedOrientation();
}
void __SetDirection(Camera* obj, const Vector3& vec, bool yawFixed)
{
	Quaternion mOrientation = Quaternion::IDENTITY;

	if (vec == Vector3::ZERO) return;

	Vector3 zAdjustVec = -vec;
	zAdjustVec.normalise();

	Quaternion targetWorldOrientation;

	if (yawFixed)
	{
		Vector3 mYawFixedAxis = Vector3::UNIT_Y;

		Vector3 xVec = mYawFixedAxis.crossProduct(zAdjustVec);
		xVec.normalise();

		Vector3 yVec = zAdjustVec.crossProduct(xVec);
		yVec.normalise();

		targetWorldOrientation.FromAxes(xVec, yVec, zAdjustVec);
	}
	else
	{
		Quaternion mRealOrientation = __GetOrientation(obj);

		Vector3 axes[3];
		mRealOrientation.ToAxes(axes);
		Quaternion rotQuat;
		if ((axes[2] + zAdjustVec).squaredLength() < 0.00005f)
		{
			rotQuat.FromAngleAxis(Radian(Math::PI), axes[1]);
		}
		else
		{
			rotQuat = axes[2].getRotationTo(zAdjustVec);
		}
		targetWorldOrientation = rotQuat * mRealOrientation;
	}

	mOrientation = targetWorldOrientation;

	SceneNode* tmpNode = obj->getParentSceneNode();
	tmpNode->_setDerivedOrientation(mOrientation);
	tmpNode->_update(true, false);
}

void setupContent() override
{
	// Create the vertex shader and its material
	/*

	vertex_program TestBug_VS hlsl
	{
		//source X_VS.hlsl
		entry_point main_vs
		target vs_5_0

		default_params
		{
			param_named_auto modelViewProj worldviewproj_matrix
		}
	}

	*/

	Ogre::String renderSystemName = mRoot->getRenderSystem()->getName();
	bool isUsingD3D9 = renderSystemName == "Direct3D9 Rendering Subsystem";

	// Create the vertex program
	Ogre::GpuProgramManager& gpuProgManager = Ogre::GpuProgramManager::getSingleton();
	auto vertexProgram = gpuProgManager.createProgram(
		"TestBug_VS",
		Ogre::ResourceGroupManager::DEFAULT_RESOURCE_GROUP_NAME,
		"hlsl",
		Ogre::GPT_VERTEX_PROGRAM);
	vertexProgram->setParameter("entry_point", "main_vs");

	if(isUsingD3D9)
		vertexProgram->setParameter("target", "vs_3_0");
	else
		vertexProgram->setParameter("target", "vs_5_0");

	Ogre::String tmpSource =
R"(float4x4 modelViewProj;

struct VS_OUTPUT
{
	float3 oVertexPos : TEXCOORD0;
	float2 oUV : TEXCOORD1;
};

VS_OUTPUT main_vs(float4 position : POSITION,
	out float4 oPosition : POSITION,
	float2 iUV : TEXCOORD0)
{
	VS_OUTPUT Out;

oPosition = mul(modelViewProj, position);
Out.oVertexPos = position.xyz;
Out.oUV = iUV;

return Out;
})";
		vertexProgram->setSource(tmpSource);
		vertexProgram->load();

	// Add default parameters to the vertex program
	auto vertexParams = vertexProgram->getDefaultParameters();
	vertexParams->setNamedAutoConstant("modelViewProj", Ogre::GpuProgramParameters::ACT_WORLDVIEWPROJ_MATRIX);





	// Create the fragment shader and its material
	/*

	fragment_program TestBug_PS hlsl
	{
		//source X_PS.hlsl
		entry_point main_ps
		target ps_5_0

		default_params
		{
			param_named lightDiffuse0 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse1 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse2 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse3 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse4 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse5 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse6 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse7 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse8 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse9 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse10 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse11 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse12 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse13 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse14 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse15 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse16 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse17 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse18 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse19 float4 0.0 0.0 0.0 0.0

			param_named lightSpecular0 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular1 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular2 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular3 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular4 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular5 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular6 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular7 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular8 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular9 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular10 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular11 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular12 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular13 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular14 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular15 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular16 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular17 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular18 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular19 float4 0.0 0.0 0.0 0.0

			param_named lightAttenuation0 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation1 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation2 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation3 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation4 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation5 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation6 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation7 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation8 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation9 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation10 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation11 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation12 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation13 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation14 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation15 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation16 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation17 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation18 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation19 float4 0.0 0.0 0.0 0.0

			param_named lightPosition0 float4 0.0 0.0 0.0 0.0
			param_named lightPosition1 float4 0.0 0.0 0.0 0.0
			param_named lightPosition2 float4 0.0 0.0 0.0 0.0
			param_named lightPosition3 float4 0.0 0.0 0.0 0.0
			param_named lightPosition4 float4 0.0 0.0 0.0 0.0
			param_named lightPosition5 float4 0.0 0.0 0.0 0.0
			param_named lightPosition6 float4 0.0 0.0 0.0 0.0
			param_named lightPosition7 float4 0.0 0.0 0.0 0.0
			param_named lightPosition8 float4 0.0 0.0 0.0 0.0
			param_named lightPosition9 float4 0.0 0.0 0.0 0.0
			param_named lightPosition10 float4 0.0 0.0 0.0 0.0
			param_named lightPosition11 float4 0.0 0.0 0.0 0.0
			param_named lightPosition12 float4 0.0 0.0 0.0 0.0
			param_named lightPosition13 float4 0.0 0.0 0.0 0.0
			param_named lightPosition14 float4 0.0 0.0 0.0 0.0
			param_named lightPosition15 float4 0.0 0.0 0.0 0.0
			param_named lightPosition16 float4 0.0 0.0 0.0 0.0
			param_named lightPosition17 float4 0.0 0.0 0.0 0.0
			param_named lightPosition18 float4 0.0 0.0 0.0 0.0
			param_named lightPosition19 float4 0.0 0.0 0.0 0.0

			param_named lightDirection0 float4 0.0 0.0 0.0 0.0
			param_named lightDirection1 float4 0.0 0.0 0.0 0.0
			param_named lightDirection2 float4 0.0 0.0 0.0 0.0
			param_named lightDirection3 float4 0.0 0.0 0.0 0.0
			param_named lightDirection4 float4 0.0 0.0 0.0 0.0
			param_named lightDirection5 float4 0.0 0.0 0.0 0.0
			param_named lightDirection6 float4 0.0 0.0 0.0 0.0
			param_named lightDirection7 float4 0.0 0.0 0.0 0.0
			param_named lightDirection8 float4 0.0 0.0 0.0 0.0
			param_named lightDirection9 float4 0.0 0.0 0.0 0.0
			param_named lightDirection10 float4 0.0 0.0 0.0 0.0
			param_named lightDirection11 float4 0.0 0.0 0.0 0.0
			param_named lightDirection12 float4 0.0 0.0 0.0 0.0
			param_named lightDirection13 float4 0.0 0.0 0.0 0.0
			param_named lightDirection14 float4 0.0 0.0 0.0 0.0
			param_named lightDirection15 float4 0.0 0.0 0.0 0.0
			param_named lightDirection16 float4 0.0 0.0 0.0 0.0
			param_named lightDirection17 float4 0.0 0.0 0.0 0.0
			param_named lightDirection18 float4 0.0 0.0 0.0 0.0
			param_named lightDirection19 float4 0.0 0.0 0.0 0.0

			param_named lightSpotParams0 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams1 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams2 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams3 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams4 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams5 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams6 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams7 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams8 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams9 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams10 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams11 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams12 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams13 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams14 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams15 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams16 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams17 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams18 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams19 float4 0.0 0.0 0.0 0.0
		}
	}

	*/

	// Create the fragment program
	auto fragmentProgram = gpuProgManager.createProgram(
		"TestBug_PS",
		Ogre::ResourceGroupManager::DEFAULT_RESOURCE_GROUP_NAME,
		"hlsl",
		Ogre::GPT_FRAGMENT_PROGRAM);
	fragmentProgram->setParameter("entry_point", "main_ps");

	if(isUsingD3D9)
		fragmentProgram->setParameter("target", "ps_3_0");
	else
		fragmentProgram->setParameter("target", "ps_5_0");

	tmpSource =
R"(float4 lightDiffuse0;
float4 lightDiffuse1;
float4 lightDiffuse2;
float4 lightDiffuse3;
float4 lightDiffuse4;
float4 lightDiffuse5;
float4 lightDiffuse6;
float4 lightDiffuse7;
float4 lightDiffuse8;
float4 lightDiffuse9;
float4 lightDiffuse10;
float4 lightDiffuse11;
float4 lightDiffuse12;
float4 lightDiffuse13;
float4 lightDiffuse14;
float4 lightDiffuse15;
float4 lightDiffuse16;
float4 lightDiffuse17;
float4 lightDiffuse18;
float4 lightDiffuse19;

float4 lightSpecular0;
float4 lightSpecular1;
float4 lightSpecular2;
float4 lightSpecular3;
float4 lightSpecular4;
float4 lightSpecular5;
float4 lightSpecular6;
float4 lightSpecular7;
float4 lightSpecular8;
float4 lightSpecular9;
float4 lightSpecular10;
float4 lightSpecular11;
float4 lightSpecular12;
float4 lightSpecular13;
float4 lightSpecular14;
float4 lightSpecular15;
float4 lightSpecular16;
float4 lightSpecular17;
float4 lightSpecular18;
float4 lightSpecular19;

float4 lightAttenuation0;
float4 lightAttenuation1;
float4 lightAttenuation2;
float4 lightAttenuation3;
float4 lightAttenuation4;
float4 lightAttenuation5;
float4 lightAttenuation6;
float4 lightAttenuation7;
float4 lightAttenuation8;
float4 lightAttenuation9;
float4 lightAttenuation10;
float4 lightAttenuation11;
float4 lightAttenuation12;
float4 lightAttenuation13;
float4 lightAttenuation14;
float4 lightAttenuation15;
float4 lightAttenuation16;
float4 lightAttenuation17;
float4 lightAttenuation18;
float4 lightAttenuation19;

float4 lightPosition0;
float4 lightPosition1;
float4 lightPosition2;
float4 lightPosition3;
float4 lightPosition4;
float4 lightPosition5;
float4 lightPosition6;
float4 lightPosition7;
float4 lightPosition8;
float4 lightPosition9;
float4 lightPosition10;
float4 lightPosition11;
float4 lightPosition12;
float4 lightPosition13;
float4 lightPosition14;
float4 lightPosition15;
float4 lightPosition16;
float4 lightPosition17;
float4 lightPosition18;
float4 lightPosition19;

float4 lightDirection0;
float4 lightDirection1;
float4 lightDirection2;
float4 lightDirection3;
float4 lightDirection4;
float4 lightDirection5;
float4 lightDirection6;
float4 lightDirection7;
float4 lightDirection8;
float4 lightDirection9;
float4 lightDirection10;
float4 lightDirection11;
float4 lightDirection12;
float4 lightDirection13;
float4 lightDirection14;
float4 lightDirection15;
float4 lightDirection16;
float4 lightDirection17;
float4 lightDirection18;
float4 lightDirection19;

float4 lightSpotParams0;
float4 lightSpotParams1;
float4 lightSpotParams2;
float4 lightSpotParams3;
float4 lightSpotParams4;
float4 lightSpotParams5;
float4 lightSpotParams6;
float4 lightSpotParams7;
float4 lightSpotParams8;
float4 lightSpotParams9;
float4 lightSpotParams10;
float4 lightSpotParams11;
float4 lightSpotParams12;
float4 lightSpotParams13;
float4 lightSpotParams14;
float4 lightSpotParams15;
float4 lightSpotParams16;
float4 lightSpotParams17;
float4 lightSpotParams18;
float4 lightSpotParams19;



)";

	if (isUsingD3D9)
	{
tmpSource +=
R"(sampler DiffuseMap : register(s0);)";
		}
		else
		{
tmpSource +=
R"(SamplerState DiffuseMap_state : register(s0);
Texture2D DiffuseMap : register(t0);)";
		}

tmpSource +=
R"(
float4 main_ps(float3 position : TEXCOORD0,
			   float2 uv : TEXCOORD1 ) : COLOR0
{
	float4 color = float4(0,0,0,0);
)";

if (isUsingD3D9)
{
tmpSource +=
R"(
	float3 DiffuseMapColour = tex2D(DiffuseMap, uv).xyz;
)";
}
else
{
tmpSource +=
R"(
	float3 DiffuseMapColour = DiffuseMap.Sample(DiffuseMap_state, uv).xyz;
)";
}

tmpSource +=
R"(
	color.xyz += DiffuseMapColour * 0.5;

color.w = 0.5;

float asdw = 0.0;

asdw += position.x * 0.00001;
asdw += position.y * 0.00001;
asdw += position.z * 0.00001;



const float multiplierAsd = 0.01;

asdw += lightDiffuse0.x * multiplierAsd;
asdw += lightDiffuse0.y * multiplierAsd;
asdw += lightDiffuse0.z * multiplierAsd;
asdw += lightDiffuse0.w * multiplierAsd;
asdw += lightDiffuse1.x * multiplierAsd;
asdw += lightDiffuse1.y * multiplierAsd;
asdw += lightDiffuse1.z * multiplierAsd;
asdw += lightDiffuse1.w * multiplierAsd;
asdw += lightDiffuse2.x * multiplierAsd;
asdw += lightDiffuse2.y * multiplierAsd;
asdw += lightDiffuse2.z * multiplierAsd;
asdw += lightDiffuse2.w * multiplierAsd;
asdw += lightDiffuse3.x * multiplierAsd;
asdw += lightDiffuse3.y * multiplierAsd;
asdw += lightDiffuse3.z * multiplierAsd;
asdw += lightDiffuse3.w * multiplierAsd;
asdw += lightDiffuse4.x * multiplierAsd;
asdw += lightDiffuse4.y * multiplierAsd;
asdw += lightDiffuse4.z * multiplierAsd;
asdw += lightDiffuse4.w * multiplierAsd;
asdw += lightDiffuse5.x * multiplierAsd;
asdw += lightDiffuse5.y * multiplierAsd;
asdw += lightDiffuse5.z * multiplierAsd;
asdw += lightDiffuse5.w * multiplierAsd;
asdw += lightDiffuse6.x * multiplierAsd;
asdw += lightDiffuse6.y * multiplierAsd;
asdw += lightDiffuse6.z * multiplierAsd;
asdw += lightDiffuse6.w * multiplierAsd;
asdw += lightDiffuse7.x * multiplierAsd;
asdw += lightDiffuse7.y * multiplierAsd;
asdw += lightDiffuse7.z * multiplierAsd;
asdw += lightDiffuse7.w * multiplierAsd;
asdw += lightDiffuse8.x * multiplierAsd;
asdw += lightDiffuse8.y * multiplierAsd;
asdw += lightDiffuse8.z * multiplierAsd;
asdw += lightDiffuse8.w * multiplierAsd;
asdw += lightDiffuse9.x * multiplierAsd;
asdw += lightDiffuse9.y * multiplierAsd;
asdw += lightDiffuse9.z * multiplierAsd;
asdw += lightDiffuse9.w * multiplierAsd;
asdw += lightDiffuse10.x * multiplierAsd;
asdw += lightDiffuse10.y * multiplierAsd;
asdw += lightDiffuse10.z * multiplierAsd;
asdw += lightDiffuse10.w * multiplierAsd;
asdw += lightDiffuse11.x * multiplierAsd;
asdw += lightDiffuse11.y * multiplierAsd;
asdw += lightDiffuse11.z * multiplierAsd;
asdw += lightDiffuse11.w * multiplierAsd;
asdw += lightDiffuse12.x * multiplierAsd;
asdw += lightDiffuse12.y * multiplierAsd;
asdw += lightDiffuse12.z * multiplierAsd;
asdw += lightDiffuse12.w * multiplierAsd;
asdw += lightDiffuse13.x * multiplierAsd;
asdw += lightDiffuse13.y * multiplierAsd;
asdw += lightDiffuse13.z * multiplierAsd;
asdw += lightDiffuse13.w * multiplierAsd;
asdw += lightDiffuse14.x * multiplierAsd;
asdw += lightDiffuse14.y * multiplierAsd;
asdw += lightDiffuse14.z * multiplierAsd;
asdw += lightDiffuse14.w * multiplierAsd;
asdw += lightDiffuse15.x * multiplierAsd;
asdw += lightDiffuse15.y * multiplierAsd;
asdw += lightDiffuse15.z * multiplierAsd;
asdw += lightDiffuse15.w * multiplierAsd;
asdw += lightDiffuse16.x * multiplierAsd;
asdw += lightDiffuse16.y * multiplierAsd;
asdw += lightDiffuse16.z * multiplierAsd;
asdw += lightDiffuse16.w * multiplierAsd;
asdw += lightDiffuse17.x * multiplierAsd;
asdw += lightDiffuse17.y * multiplierAsd;
asdw += lightDiffuse17.z * multiplierAsd;
asdw += lightDiffuse17.w * multiplierAsd;
asdw += lightDiffuse18.x * multiplierAsd;
asdw += lightDiffuse18.y * multiplierAsd;
asdw += lightDiffuse18.z * multiplierAsd;
asdw += lightDiffuse18.w * multiplierAsd;
asdw += lightDiffuse19.x * multiplierAsd;
asdw += lightDiffuse19.y * multiplierAsd;
asdw += lightDiffuse19.z * multiplierAsd;
asdw += lightDiffuse19.w * multiplierAsd;

asdw += lightSpecular0.x * multiplierAsd;
asdw += lightSpecular0.y * multiplierAsd;
asdw += lightSpecular0.z * multiplierAsd;
asdw += lightSpecular0.w * multiplierAsd;
asdw += lightSpecular1.x * multiplierAsd;
asdw += lightSpecular1.y * multiplierAsd;
asdw += lightSpecular1.z * multiplierAsd;
asdw += lightSpecular1.w * multiplierAsd;
asdw += lightSpecular2.x * multiplierAsd;
asdw += lightSpecular2.y * multiplierAsd;
asdw += lightSpecular2.z * multiplierAsd;
asdw += lightSpecular2.w * multiplierAsd;
asdw += lightSpecular3.x * multiplierAsd;
asdw += lightSpecular3.y * multiplierAsd;
asdw += lightSpecular3.z * multiplierAsd;
asdw += lightSpecular3.w * multiplierAsd;
asdw += lightSpecular4.x * multiplierAsd;
asdw += lightSpecular4.y * multiplierAsd;
asdw += lightSpecular4.z * multiplierAsd;
asdw += lightSpecular4.w * multiplierAsd;
asdw += lightSpecular5.x * multiplierAsd;
asdw += lightSpecular5.y * multiplierAsd;
asdw += lightSpecular5.z * multiplierAsd;
asdw += lightSpecular5.w * multiplierAsd;
asdw += lightSpecular6.x * multiplierAsd;
asdw += lightSpecular6.y * multiplierAsd;
asdw += lightSpecular6.z * multiplierAsd;
asdw += lightSpecular6.w * multiplierAsd;
asdw += lightSpecular7.x * multiplierAsd;
asdw += lightSpecular7.y * multiplierAsd;
asdw += lightSpecular7.z * multiplierAsd;
asdw += lightSpecular7.w * multiplierAsd;
asdw += lightSpecular8.x * multiplierAsd;
asdw += lightSpecular8.y * multiplierAsd;
asdw += lightSpecular8.z * multiplierAsd;
asdw += lightSpecular8.w * multiplierAsd;
asdw += lightSpecular9.x * multiplierAsd;
asdw += lightSpecular9.y * multiplierAsd;
asdw += lightSpecular9.z * multiplierAsd;
asdw += lightSpecular9.w * multiplierAsd;
asdw += lightSpecular10.x * multiplierAsd;
asdw += lightSpecular10.y * multiplierAsd;
asdw += lightSpecular10.z * multiplierAsd;
asdw += lightSpecular10.w * multiplierAsd;
asdw += lightSpecular11.x * multiplierAsd;
asdw += lightSpecular11.y * multiplierAsd;
asdw += lightSpecular11.z * multiplierAsd;
asdw += lightSpecular11.w * multiplierAsd;
asdw += lightSpecular12.x * multiplierAsd;
asdw += lightSpecular12.y * multiplierAsd;
asdw += lightSpecular12.z * multiplierAsd;
asdw += lightSpecular12.w * multiplierAsd;
asdw += lightSpecular13.x * multiplierAsd;
asdw += lightSpecular13.y * multiplierAsd;
asdw += lightSpecular13.z * multiplierAsd;
asdw += lightSpecular13.w * multiplierAsd;
asdw += lightSpecular14.x * multiplierAsd;
asdw += lightSpecular14.y * multiplierAsd;
asdw += lightSpecular14.z * multiplierAsd;
asdw += lightSpecular14.w * multiplierAsd;
asdw += lightSpecular15.x * multiplierAsd;
asdw += lightSpecular15.y * multiplierAsd;
asdw += lightSpecular15.z * multiplierAsd;
asdw += lightSpecular15.w * multiplierAsd;
asdw += lightSpecular16.x * multiplierAsd;
asdw += lightSpecular16.y * multiplierAsd;
asdw += lightSpecular16.z * multiplierAsd;
asdw += lightSpecular16.w * multiplierAsd;
asdw += lightSpecular17.x * multiplierAsd;
asdw += lightSpecular17.y * multiplierAsd;
asdw += lightSpecular17.z * multiplierAsd;
asdw += lightSpecular17.w * multiplierAsd;
asdw += lightSpecular18.x * multiplierAsd;
asdw += lightSpecular18.y * multiplierAsd;
asdw += lightSpecular18.z * multiplierAsd;
asdw += lightSpecular18.w * multiplierAsd;
asdw += lightSpecular19.x * multiplierAsd;
asdw += lightSpecular19.y * multiplierAsd;
asdw += lightSpecular19.z * multiplierAsd;
asdw += lightSpecular19.w * multiplierAsd;

asdw += lightAttenuation0.x * multiplierAsd;
asdw += lightAttenuation0.y * multiplierAsd;
asdw += lightAttenuation0.z * multiplierAsd;
asdw += lightAttenuation0.w * multiplierAsd;
asdw += lightAttenuation1.x * multiplierAsd;
asdw += lightAttenuation1.y * multiplierAsd;
asdw += lightAttenuation1.z * multiplierAsd;
asdw += lightAttenuation1.w * multiplierAsd;
asdw += lightAttenuation2.x * multiplierAsd;
asdw += lightAttenuation2.y * multiplierAsd;
asdw += lightAttenuation2.z * multiplierAsd;
asdw += lightAttenuation2.w * multiplierAsd;
asdw += lightAttenuation3.x * multiplierAsd;
asdw += lightAttenuation3.y * multiplierAsd;
asdw += lightAttenuation3.z * multiplierAsd;
asdw += lightAttenuation3.w * multiplierAsd;
asdw += lightAttenuation4.x * multiplierAsd;
asdw += lightAttenuation4.y * multiplierAsd;
asdw += lightAttenuation4.z * multiplierAsd;
asdw += lightAttenuation4.w * multiplierAsd;
asdw += lightAttenuation5.x * multiplierAsd;
asdw += lightAttenuation5.y * multiplierAsd;
asdw += lightAttenuation5.z * multiplierAsd;
asdw += lightAttenuation5.w * multiplierAsd;
asdw += lightAttenuation6.x * multiplierAsd;
asdw += lightAttenuation6.y * multiplierAsd;
asdw += lightAttenuation6.z * multiplierAsd;
asdw += lightAttenuation6.w * multiplierAsd;
asdw += lightAttenuation7.x * multiplierAsd;
asdw += lightAttenuation7.y * multiplierAsd;
asdw += lightAttenuation7.z * multiplierAsd;
asdw += lightAttenuation7.w * multiplierAsd;
asdw += lightAttenuation8.x * multiplierAsd;
asdw += lightAttenuation8.y * multiplierAsd;
asdw += lightAttenuation8.z * multiplierAsd;
asdw += lightAttenuation8.w * multiplierAsd;
asdw += lightAttenuation9.x * multiplierAsd;
asdw += lightAttenuation9.y * multiplierAsd;
asdw += lightAttenuation9.z * multiplierAsd;
asdw += lightAttenuation9.w * multiplierAsd;
asdw += lightAttenuation10.x * multiplierAsd;
asdw += lightAttenuation10.y * multiplierAsd;
asdw += lightAttenuation10.z * multiplierAsd;
asdw += lightAttenuation10.w * multiplierAsd;
asdw += lightAttenuation11.x * multiplierAsd;
asdw += lightAttenuation11.y * multiplierAsd;
asdw += lightAttenuation11.z * multiplierAsd;
asdw += lightAttenuation11.w * multiplierAsd;
asdw += lightAttenuation12.x * multiplierAsd;
asdw += lightAttenuation12.y * multiplierAsd;
asdw += lightAttenuation12.z * multiplierAsd;
asdw += lightAttenuation12.w * multiplierAsd;
asdw += lightAttenuation13.x * multiplierAsd;
asdw += lightAttenuation13.y * multiplierAsd;
asdw += lightAttenuation13.z * multiplierAsd;
asdw += lightAttenuation13.w * multiplierAsd;
asdw += lightAttenuation14.x * multiplierAsd;
asdw += lightAttenuation14.y * multiplierAsd;
asdw += lightAttenuation14.z * multiplierAsd;
asdw += lightAttenuation14.w * multiplierAsd;
asdw += lightAttenuation15.x * multiplierAsd;
asdw += lightAttenuation15.y * multiplierAsd;
asdw += lightAttenuation15.z * multiplierAsd;
asdw += lightAttenuation15.w * multiplierAsd;
asdw += lightAttenuation16.x * multiplierAsd;
asdw += lightAttenuation16.y * multiplierAsd;
asdw += lightAttenuation16.z * multiplierAsd;
asdw += lightAttenuation16.w * multiplierAsd;
asdw += lightAttenuation17.x * multiplierAsd;
asdw += lightAttenuation17.y * multiplierAsd;
asdw += lightAttenuation17.z * multiplierAsd;
asdw += lightAttenuation17.w * multiplierAsd;
asdw += lightAttenuation18.x * multiplierAsd;
asdw += lightAttenuation18.y * multiplierAsd;
asdw += lightAttenuation18.z * multiplierAsd;
asdw += lightAttenuation18.w * multiplierAsd;
asdw += lightAttenuation19.x * multiplierAsd;
asdw += lightAttenuation19.y * multiplierAsd;
asdw += lightAttenuation19.z * multiplierAsd;
asdw += lightAttenuation19.w * multiplierAsd;

asdw += lightPosition0.x * multiplierAsd;
asdw += lightPosition0.y * multiplierAsd;
asdw += lightPosition0.z * multiplierAsd;
asdw += lightPosition0.w * multiplierAsd;
asdw += lightPosition1.x * multiplierAsd;
asdw += lightPosition1.y * multiplierAsd;
asdw += lightPosition1.z * multiplierAsd;
asdw += lightPosition1.w * multiplierAsd;
asdw += lightPosition2.x * multiplierAsd;
asdw += lightPosition2.y * multiplierAsd;
asdw += lightPosition2.z * multiplierAsd;
asdw += lightPosition2.w * multiplierAsd;
asdw += lightPosition3.x * multiplierAsd;
asdw += lightPosition3.y * multiplierAsd;
asdw += lightPosition3.z * multiplierAsd;
asdw += lightPosition3.w * multiplierAsd;
asdw += lightPosition4.x * multiplierAsd;
asdw += lightPosition4.y * multiplierAsd;
asdw += lightPosition4.z * multiplierAsd;
asdw += lightPosition4.w * multiplierAsd;
asdw += lightPosition5.x * multiplierAsd;
asdw += lightPosition5.y * multiplierAsd;
asdw += lightPosition5.z * multiplierAsd;
asdw += lightPosition5.w * multiplierAsd;
asdw += lightPosition6.x * multiplierAsd;
asdw += lightPosition6.y * multiplierAsd;
asdw += lightPosition6.z * multiplierAsd;
asdw += lightPosition6.w * multiplierAsd;
asdw += lightPosition7.x * multiplierAsd;
asdw += lightPosition7.y * multiplierAsd;
asdw += lightPosition7.z * multiplierAsd;
asdw += lightPosition7.w * multiplierAsd;
asdw += lightPosition8.x * multiplierAsd;
asdw += lightPosition8.y * multiplierAsd;
asdw += lightPosition8.z * multiplierAsd;
asdw += lightPosition8.w * multiplierAsd;
asdw += lightPosition9.x * multiplierAsd;
asdw += lightPosition9.y * multiplierAsd;
asdw += lightPosition9.z * multiplierAsd;
asdw += lightPosition9.w * multiplierAsd;
asdw += lightPosition10.x * multiplierAsd;
asdw += lightPosition10.y * multiplierAsd;
asdw += lightPosition10.z * multiplierAsd;
asdw += lightPosition10.w * multiplierAsd;
asdw += lightPosition11.x * multiplierAsd;
asdw += lightPosition11.y * multiplierAsd;
asdw += lightPosition11.z * multiplierAsd;
asdw += lightPosition11.w * multiplierAsd;
asdw += lightPosition12.x * multiplierAsd;
asdw += lightPosition12.y * multiplierAsd;
asdw += lightPosition12.z * multiplierAsd;
asdw += lightPosition12.w * multiplierAsd;)";

tmpSource +=
R"(
	asdw += lightPosition13.x * multiplierAsd;
	asdw += lightPosition13.y * multiplierAsd;
	asdw += lightPosition13.z * multiplierAsd;
	asdw += lightPosition13.w * multiplierAsd;
	asdw += lightPosition14.x * multiplierAsd;
	asdw += lightPosition14.y * multiplierAsd;
	asdw += lightPosition14.z * multiplierAsd;
	asdw += lightPosition14.w * multiplierAsd;
	asdw += lightPosition15.x * multiplierAsd;
	asdw += lightPosition15.y * multiplierAsd;
	asdw += lightPosition15.z * multiplierAsd;
	asdw += lightPosition15.w * multiplierAsd;
	asdw += lightPosition16.x * multiplierAsd;
	asdw += lightPosition16.y * multiplierAsd;
	asdw += lightPosition16.z * multiplierAsd;
	asdw += lightPosition16.w * multiplierAsd;
	asdw += lightPosition17.x * multiplierAsd;
	asdw += lightPosition17.y * multiplierAsd;
	asdw += lightPosition17.z * multiplierAsd;
	asdw += lightPosition17.w * multiplierAsd;
	asdw += lightPosition18.x * multiplierAsd;
	asdw += lightPosition18.y * multiplierAsd;
	asdw += lightPosition18.z * multiplierAsd;
	asdw += lightPosition18.w * multiplierAsd;
	asdw += lightPosition19.x * multiplierAsd;
	asdw += lightPosition19.y * multiplierAsd;
	asdw += lightPosition19.z * multiplierAsd;
	asdw += lightPosition19.w * multiplierAsd;

asdw += lightDirection0.x * multiplierAsd;
asdw += lightDirection0.y * multiplierAsd;
asdw += lightDirection0.z * multiplierAsd;
asdw += lightDirection0.w * multiplierAsd;
asdw += lightDirection1.x * multiplierAsd;
asdw += lightDirection1.y * multiplierAsd;
asdw += lightDirection1.z * multiplierAsd;
asdw += lightDirection1.w * multiplierAsd;
asdw += lightDirection2.x * multiplierAsd;
asdw += lightDirection2.y * multiplierAsd;
asdw += lightDirection2.z * multiplierAsd;
asdw += lightDirection2.w * multiplierAsd;
asdw += lightDirection3.x * multiplierAsd;
asdw += lightDirection3.y * multiplierAsd;
asdw += lightDirection3.z * multiplierAsd;
asdw += lightDirection3.w * multiplierAsd;
asdw += lightDirection4.x * multiplierAsd;
asdw += lightDirection4.y * multiplierAsd;
asdw += lightDirection4.z * multiplierAsd;
asdw += lightDirection4.w * multiplierAsd;
asdw += lightDirection5.x * multiplierAsd;
asdw += lightDirection5.y * multiplierAsd;
asdw += lightDirection5.z * multiplierAsd;
asdw += lightDirection5.w * multiplierAsd;
asdw += lightDirection6.x * multiplierAsd;
asdw += lightDirection6.y * multiplierAsd;
asdw += lightDirection6.z * multiplierAsd;
asdw += lightDirection6.w * multiplierAsd;
asdw += lightDirection7.x * multiplierAsd;
asdw += lightDirection7.y * multiplierAsd;
asdw += lightDirection7.z * multiplierAsd;
asdw += lightDirection7.w * multiplierAsd;
asdw += lightDirection8.x * multiplierAsd;
asdw += lightDirection8.y * multiplierAsd;
asdw += lightDirection8.z * multiplierAsd;
asdw += lightDirection8.w * multiplierAsd;
asdw += lightDirection9.x * multiplierAsd;
asdw += lightDirection9.y * multiplierAsd;
asdw += lightDirection9.z * multiplierAsd;
asdw += lightDirection9.w * multiplierAsd;
asdw += lightDirection10.x * multiplierAsd;
asdw += lightDirection10.y * multiplierAsd;
asdw += lightDirection10.z * multiplierAsd;
asdw += lightDirection10.w * multiplierAsd;
asdw += lightDirection11.x * multiplierAsd;
asdw += lightDirection11.y * multiplierAsd;
asdw += lightDirection11.z * multiplierAsd;
asdw += lightDirection11.w * multiplierAsd;
asdw += lightDirection12.x * multiplierAsd;
asdw += lightDirection12.y * multiplierAsd;
asdw += lightDirection12.z * multiplierAsd;
asdw += lightDirection12.w * multiplierAsd;
asdw += lightDirection13.x * multiplierAsd;
asdw += lightDirection13.y * multiplierAsd;
asdw += lightDirection13.z * multiplierAsd;
asdw += lightDirection13.w * multiplierAsd;
asdw += lightDirection14.x * multiplierAsd;
asdw += lightDirection14.y * multiplierAsd;
asdw += lightDirection14.z * multiplierAsd;
asdw += lightDirection14.w * multiplierAsd;
asdw += lightDirection15.x * multiplierAsd;
asdw += lightDirection15.y * multiplierAsd;
asdw += lightDirection15.z * multiplierAsd;
asdw += lightDirection15.w * multiplierAsd;
asdw += lightDirection16.x * multiplierAsd;
asdw += lightDirection16.y * multiplierAsd;
asdw += lightDirection16.z * multiplierAsd;
asdw += lightDirection16.w * multiplierAsd;
asdw += lightDirection17.x * multiplierAsd;
asdw += lightDirection17.y * multiplierAsd;
asdw += lightDirection17.z * multiplierAsd;
asdw += lightDirection17.w * multiplierAsd;
asdw += lightDirection18.x * multiplierAsd;
asdw += lightDirection18.y * multiplierAsd;
asdw += lightDirection18.z * multiplierAsd;
asdw += lightDirection18.w * multiplierAsd;
asdw += lightDirection19.x * multiplierAsd;
asdw += lightDirection19.y * multiplierAsd;
asdw += lightDirection19.z * multiplierAsd;
asdw += lightDirection19.w * multiplierAsd;

asdw += lightSpotParams0.x * multiplierAsd;
asdw += lightSpotParams0.y * multiplierAsd;
asdw += lightSpotParams0.z * multiplierAsd;
asdw += lightSpotParams0.w * multiplierAsd;
asdw += lightSpotParams1.x * multiplierAsd;
asdw += lightSpotParams1.y * multiplierAsd;
asdw += lightSpotParams1.z * multiplierAsd;
asdw += lightSpotParams1.w * multiplierAsd;
asdw += lightSpotParams2.x * multiplierAsd;
asdw += lightSpotParams2.y * multiplierAsd;
asdw += lightSpotParams2.z * multiplierAsd;
asdw += lightSpotParams2.w * multiplierAsd;
asdw += lightSpotParams3.x * multiplierAsd;
asdw += lightSpotParams3.y * multiplierAsd;
asdw += lightSpotParams3.z * multiplierAsd;
asdw += lightSpotParams3.w * multiplierAsd;
asdw += lightSpotParams4.x * multiplierAsd;
asdw += lightSpotParams4.y * multiplierAsd;
asdw += lightSpotParams4.z * multiplierAsd;
asdw += lightSpotParams4.w * multiplierAsd;
asdw += lightSpotParams5.x * multiplierAsd;
asdw += lightSpotParams5.y * multiplierAsd;
asdw += lightSpotParams5.z * multiplierAsd;
asdw += lightSpotParams5.w * multiplierAsd;
asdw += lightSpotParams6.x * multiplierAsd;
asdw += lightSpotParams6.y * multiplierAsd;
asdw += lightSpotParams6.z * multiplierAsd;
asdw += lightSpotParams6.w * multiplierAsd;
asdw += lightSpotParams7.x * multiplierAsd;
asdw += lightSpotParams7.y * multiplierAsd;
asdw += lightSpotParams7.z * multiplierAsd;
asdw += lightSpotParams7.w * multiplierAsd;
asdw += lightSpotParams8.x * multiplierAsd;
asdw += lightSpotParams8.y * multiplierAsd;
asdw += lightSpotParams8.z * multiplierAsd;
asdw += lightSpotParams8.w * multiplierAsd;
asdw += lightSpotParams9.x * multiplierAsd;
asdw += lightSpotParams9.y * multiplierAsd;
asdw += lightSpotParams9.z * multiplierAsd;
asdw += lightSpotParams9.w * multiplierAsd;
asdw += lightSpotParams10.x * multiplierAsd;
asdw += lightSpotParams10.y * multiplierAsd;
asdw += lightSpotParams10.z * multiplierAsd;
asdw += lightSpotParams10.w * multiplierAsd;
asdw += lightSpotParams11.x * multiplierAsd;
asdw += lightSpotParams11.y * multiplierAsd;
asdw += lightSpotParams11.z * multiplierAsd;
asdw += lightSpotParams11.w * multiplierAsd;
asdw += lightSpotParams12.x * multiplierAsd;
asdw += lightSpotParams12.y * multiplierAsd;
asdw += lightSpotParams12.z * multiplierAsd;
asdw += lightSpotParams12.w * multiplierAsd;
asdw += lightSpotParams13.x * multiplierAsd;
asdw += lightSpotParams13.y * multiplierAsd;
asdw += lightSpotParams13.z * multiplierAsd;
asdw += lightSpotParams13.w * multiplierAsd;
asdw += lightSpotParams14.x * multiplierAsd;
asdw += lightSpotParams14.y * multiplierAsd;
asdw += lightSpotParams14.z * multiplierAsd;
asdw += lightSpotParams14.w * multiplierAsd;
asdw += lightSpotParams15.x * multiplierAsd;
asdw += lightSpotParams15.y * multiplierAsd;
asdw += lightSpotParams15.z * multiplierAsd;
asdw += lightSpotParams15.w * multiplierAsd;
asdw += lightSpotParams16.x * multiplierAsd;
asdw += lightSpotParams16.y * multiplierAsd;
asdw += lightSpotParams16.z * multiplierAsd;
asdw += lightSpotParams16.w * multiplierAsd;
asdw += lightSpotParams17.x * multiplierAsd;
asdw += lightSpotParams17.y * multiplierAsd;
asdw += lightSpotParams17.z * multiplierAsd;
asdw += lightSpotParams17.w * multiplierAsd;
asdw += lightSpotParams18.x * multiplierAsd;
asdw += lightSpotParams18.y * multiplierAsd;
asdw += lightSpotParams18.z * multiplierAsd;
asdw += lightSpotParams18.w * multiplierAsd;
asdw += lightSpotParams19.x * multiplierAsd;
asdw += lightSpotParams19.y * multiplierAsd;
asdw += lightSpotParams19.z * multiplierAsd;
asdw += lightSpotParams19.w * multiplierAsd;

color.x += asdw;

color.x *= 0.000001;
color.x += lightDiffuse0.x;

return color;
})";
		fragmentProgram->setSource(tmpSource);
		fragmentProgram->load();

	// Add default parameters to the fragment program
	auto fragmentParams = fragmentProgram->getDefaultParameters();

	// Initialize the parameters
	for (int i = 0; i < 20; ++i)
	{
		std::string is = std::to_string(i);
		fragmentParams->setNamedConstant("lightDiffuse" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightSpecular" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightAttenuation" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightPosition" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightDirection" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightSpotParams" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
	}



	// Create the material
	/*

	material TestBug
	{
		technique
		{
			pass
			{
				vertex_program_ref TestBug_VS
				{
				}

				fragment_program_ref TestBug_PS
				{
				}

				texture_unit DiffuseMap
				{
					texture nskingr.jpg // Ninja
				}
			}
		}
	}

	*/

	// Create a material and set its techniques and passes
	Ogre::MaterialManager& matManager = Ogre::MaterialManager::getSingleton();
	auto material = matManager.create(
		"TestBug",
		Ogre::ResourceGroupManager::DEFAULT_RESOURCE_GROUP_NAME);
	auto technique = material->getTechnique(0);
	auto pass = technique->getPass(0);

	// Set vertex and fragment programs to the pass
	pass->setVertexProgram(vertexProgram->getName());
	pass->setFragmentProgram(fragmentProgram->getName());
	pass->createTextureUnitState("nskingr.jpg");
	material->load();



	const float tmpSize = 70.0f;
	const float tmpStepSize = 2.0f;
	for (float x = -tmpSize; x < tmpSize; x += tmpStepSize)
	{
		for (float y = -tmpSize; y < tmpSize; y += tmpStepSize)
		{
			Entity* tmpEntity = mSceneMgr->createEntity("ogrehead.mesh");
			tmpEntity->setMaterialName("TestBug");

			SceneNode* tmpNode = mSceneMgr->getRootSceneNode()->createChildSceneNode();
			tmpNode->attachObject(tmpEntity);
			tmpNode->setPosition(Vector3(x, 0.0f, y));

			tmpNode->setScale(Vector3(0.1f, 0.02f, 0.1f));

			tmpEntities.push_back(tmpEntity);
			tmpSceneNodes.push_back(tmpNode);
		}
	}

	float tmpDistance = 2.0f;
	__SetPosition(mCamera, Vector3(tmpDistance, tmpDistance, tmpDistance));
	__LookAt(mCamera, Vector3::ZERO);
	mCamera->setNearClipDistance(0.01f);
	mCamera->setFarClipDistance(1000.0f);

	mCameraMan->setStyle(CS_MANUAL);

	MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_ANISOTROPIC);
	MaterialManager::getSingleton().setDefaultAnisotropy(16);

	SceneNode* tmpParent = mSceneMgr->getRootSceneNode()->createChildSceneNode();
	tmpParent->setPosition(Vector3(0.0f, 2.0f, 0.0f));

	mTrayMgr->showCursor();
}

void cleanupContent() override
{
	for (size_t i = 0; i < tmpEntities.size(); i++)
	{
		tmpEntities[i]->detachFromParent();
		mSceneMgr->destroyEntity(tmpEntities[i]);
	}
	tmpEntities.clear();

	for (size_t i = 0; i < tmpSceneNodes.size(); i++)
	{
		mSceneMgr->destroySceneNode(tmpSceneNodes[i]);
	}
	tmpSceneNodes.clear();
}
};

#endif

Then follow these instructions:

When you start it, if the Ogre dialog does not come up, exit and delete the directory at C:\YOUR_USERNAME\rwn00\Documents\OGRE Sample Browser.

When the Ogre Dialog comes up, select to use Direct3D11, VSync Off and 1920x1080.
Go into the CubeMapping sample (sample 12/54).

Note the FPS. In that scene (using a lot of scaled ogre heads with a ninja texture on them) I have around 280 FPS.
Toggle on Numlock to see the models become red. Notice the FPS, I get around 190.
Toggle off Numlock to see the models normally again. Notice that the FPS is still 190 even if the previous constants have been set again.
Toggle on Caplock and instantly toggle it off again. Notice that the FPS comes back to 280.

Some scenes are worse than this. For some meshes I get 800 FPS before the bug and 150 FPS after the bug, but that is a bit craving on the GPU so I chose to use the ogrehead mesh instead for a bit lower FPS.

What is actually happening in the code is that I am using SetNamedConstant on all constants in the shader for the material, every frame, to either "ColourValue::Black" or "ColourValue(0.5f, 0.5f, 0.5f, 1.0f)" when numlock is active (which turns them red because of how the shader works).
So in short, lights are not even needed to reproduce this bug.

When you have capslock toggled, my hacky fix is done from my previous post, which fixes the FPS bug (at least temporarily, until you start using numlock again).

If you press the middle mouse button, the fragment shader is recompiled, which also fixes the FPS bug (at least temporarily, until you start using numlock again).

You can also toggle on Numlock and see the FPS go down to 190 and then use capslock or the middle mouse button to fix the FPS bug as well, which shows that making the models/shader look red is not what is causing the FPS drop.

The same bug also happens for D3D9. The code automatically detects if you are using D3D9 or D3D11 and creates the shaders needed. There, you get 800 FPS at first (D3D9 is extremely much faster than D3D11 for some reason) and then 200 FPS after the bug has happened.
The bug requires a bit more switching on and off Numlock to happen for D3D9, and my Capslock fix is only implemented for D3D11, but the middle mouse button that recompiles the shader work to fix the bug at least.

And if you want specific code to "fix" the issue by using Capslock, add the code detailed here (not needed to reproduce the bug though):
viewtopic.php?p=556942#p556942

User avatar
sercero
Bronze Sponsor
Bronze Sponsor
Posts: 487
Joined: Sun Jan 18, 2015 4:20 pm
Location: Buenos Aires, Argentina
x 170

Re: Light Creation/Deletion Performance Bug

Post by sercero »

Do you still want me to try and reproduce in my machine?

It seems that I have not followed instructions properly, I always have num lock enabled by default.

I'm not able to help with the problem because this goes over my head, but perhaps I can help in some other way.

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

Sure! I am also gathering data from some more computers as well, but it will take some time.

paroj
OGRE Team Member
OGRE Team Member
Posts: 2128
Joined: Sun Mar 30, 2014 2:51 pm
x 1141

Re: Light Creation/Deletion Performance Bug

Post by paroj »

rpgplayerrobin wrote: Wed Dec 18, 2024 10:12 pm

I have now created the smallest amount of code to reproduce the issue without the need of any external resources.
It creates the shader and materials in code now, and therefore only needs the code below to replace the code in CubeMapping.h (Samples\Simple\include\CubeMapping.h):

cannot reproduce. FPS are around 1550-1560 regardless of the setting. NVIDIA RTX 4070, Driver 561.09

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

paroj wrote: Thu Dec 26, 2024 1:19 pm

cannot reproduce. FPS are around 1550-1560 regardless of the setting. NVIDIA RTX 4070, Driver 561.09

I have experienced that as well for some computers, which is why I have made a couple of changes to the code to showcase the problem in a better way.
Could you try the new version?

Here is the full code needed (Samples\Simple\include\CubeMapping.h):

Code: Select all

#ifndef __CubeMapping_H__
#define __CubeMapping_H__

#include "SdkSample.h"
#include "windows.h"

//#define USE_LIGHT_ARRAYS

using namespace Ogre;
using namespace OgreBites;

class _OgreSampleClassExport Sample_CubeMapping : public SdkSample, public RenderTargetListener
{
public:

Sample_CubeMapping()
{
    mInfo["Title"] = "Cube Mapping";
    mInfo["Description"] = "Demonstrates the cube mapping feature where a wrap-around environment is reflected "
        "off of an object. Uses render-to-texture to create dynamic cubemaps.";
    mInfo["Thumbnail"] = "thumb_cubemap.png";
    mInfo["Category"] = "Unsorted";
}

std::vector<Entity*> tmpEntities;
std::vector<SceneNode*> tmpSceneNodes;

bool IsNumlockToggled()
{
	return (GetKeyState(VK_NUMLOCK) & 0x0001) != 0;
}

bool frameRenderingQueued(const FrameEvent& evt) override
{
	bool r = SdkSample::frameRenderingQueued(evt);      // don't forget the parent updates!

#ifndef USE_LIGHT_ARRAYS
		ColourValue c = ColourValue(0.0f, 0.0f, 0.0f, 1.0f);
		if (IsNumlockToggled())
			c = ColourValue(0.5f, 0.5f, 0.5f, 1.0f);
		if (tmpEntities.size() != 0)
		{
			Entity* e = tmpEntities[0];
			MaterialPtr m = e->getSubEntity(0)->getMaterial();

		GpuProgramParametersSharedPtr gpup = m->getTechnique(0)->getPass(0)->getFragmentProgramParameters();
		for (int i = 0; i <= 19; i++)
		{
			std::string is = std::to_string(i);

			gpup->setNamedConstant("lightDiffuse" + is, c);
			gpup->setNamedConstant("lightSpecular" + is, c);
			gpup->setNamedConstant("lightAttenuation" + is, c);
			gpup->setNamedConstant("lightPosition" + is, c);
			gpup->setNamedConstant("lightDirection" + is, c);
			gpup->setNamedConstant("lightSpotParams" + is, c);
		}
	}
#endif

	return r;
}

bool mouseReleased(const MouseButtonEvent& evt) override
{
	if (mTrayMgr->mouseReleased(evt))
		return true;
	if (evt.button == BUTTON_MIDDLE)
	{
		std::vector<GpuProgramPtr> tmpGPUPrograms;
		ResourceManager::ResourceMapIterator tmpItr = GpuProgramManager::getSingleton().getResourceIterator();
		if (tmpItr.begin() == tmpItr.end())
			tmpItr = HighLevelGpuProgramManager::getSingleton().getResourceIterator();
		for (ResourceManager::ResourceMapIterator::const_iterator i = tmpItr.begin(); i != tmpItr.end(); ++i)
		{
			const ResourcePtr tmpResourcePtr = i->second;

			const Ogre::String& tmpName = tmpResourcePtr->getName();
			if (/*tmpName == "Test_AlphaRejection_VS" ||*/
				tmpName == "TestBug_PS")
			{
				GpuProgramPtr tmpGpuProgram = GpuProgramManager::getSingleton().getByName(tmpName);
				if (tmpGpuProgram)
					tmpGPUPrograms.push_back(tmpGpuProgram);
			}
		}

		for (size_t i = 0; i < tmpGPUPrograms.size(); i++)
			tmpGPUPrograms[i]->reload();
	}

#ifdef USE_LIGHT_ARRAYS
		if (evt.button == BUTTON_RIGHT)
		{
			static std::vector<Light*> tmpLights;
			if (tmpLights.size() == 0)
			{
				SceneNode* tmpParent = mSceneMgr->getRootSceneNode()->createChildSceneNode();
				tmpParent->setPosition(Vector3(0.0f, 2.0f, 0.0f));
				// Create the lights
				for (int i = 0; i < 300; i++)
				{
					Light* tmpLight = mSceneMgr->createLight();
					tmpParent->attachObject(tmpLight);

				tmpLight->setCastShadows(false);
				tmpLight->setType(Light::LT_POINT);
				tmpLight->setDiffuseColour(ColourValue(0.5f, 0.5f, 0.5f, 1.0f));
				tmpLight->setSpecularColour(ColourValue(0.01f, 0.01f, 0.01f));
				tmpLight->setAttenuation(1000000.0f, 0.0f, 0.01f, 0.0f);

				tmpLights.push_back(tmpLight);
			}
		}
		else
		{
			SceneNode* tmpParent = tmpLights[0]->getParentSceneNode();
			// Destroy the lights
			for (int i = 0; i < (int)tmpLights.size(); i++)
			{
				Light* tmpLight = tmpLights[i];

				tmpLight->detachFromParent();
				mSceneMgr->destroyLight(tmpLight);
			}
			mSceneMgr->destroySceneNode(tmpParent);

			// Clear the list of lights
			tmpLights.clear();
		}
	}
#endif

	return true;
}

protected:

// Helper functions taken from my main project
Vector3 __GetPosition(Camera* obj)
{
	return obj->getDerivedPosition();
}
void __SetPosition(Camera* obj, const Vector3& vec)
{
	Node* tmpNode = obj->getParentNode();
	tmpNode->setPosition(vec);
}
void __LookAt(Camera* obj, const Vector3& pos)
{
	Vector3 tmpPosition = __GetPosition(obj);
	__SetDirection(obj, (pos - tmpPosition).normalisedCopy(), true);
}
Quaternion __GetOrientation(Camera* obj)
{
	return obj->getDerivedOrientation();
}
void __SetDirection(Camera* obj, const Vector3& vec, bool yawFixed)
{
	Quaternion mOrientation = Quaternion::IDENTITY;

	if (vec == Vector3::ZERO) return;

	Vector3 zAdjustVec = -vec;
	zAdjustVec.normalise();

	Quaternion targetWorldOrientation;

	if (yawFixed)
	{
		Vector3 mYawFixedAxis = Vector3::UNIT_Y;

		Vector3 xVec = mYawFixedAxis.crossProduct(zAdjustVec);
		xVec.normalise();

		Vector3 yVec = zAdjustVec.crossProduct(xVec);
		yVec.normalise();

		targetWorldOrientation.FromAxes(xVec, yVec, zAdjustVec);
	}
	else
	{
		Quaternion mRealOrientation = __GetOrientation(obj);

		Vector3 axes[3];
		mRealOrientation.ToAxes(axes);
		Quaternion rotQuat;
		if ((axes[2] + zAdjustVec).squaredLength() < 0.00005f)
		{
			rotQuat.FromAngleAxis(Radian(Math::PI), axes[1]);
		}
		else
		{
			rotQuat = axes[2].getRotationTo(zAdjustVec);
		}
		targetWorldOrientation = rotQuat * mRealOrientation;
	}

	mOrientation = targetWorldOrientation;

	SceneNode* tmpNode = obj->getParentSceneNode();
	tmpNode->_setDerivedOrientation(mOrientation);
	tmpNode->_update(true, false);
}

void setupContent() override
{
	// Create the vertex shader and its material

	Ogre::String renderSystemName = mRoot->getRenderSystem()->getName();
	bool isUsingD3D9 = renderSystemName == "Direct3D9 Rendering Subsystem";

	// Create the vertex program
	Ogre::GpuProgramManager& gpuProgManager = Ogre::GpuProgramManager::getSingleton();
	auto vertexProgram = gpuProgManager.createProgram(
		"TestBug_VS",
		Ogre::ResourceGroupManager::DEFAULT_RESOURCE_GROUP_NAME,
		"hlsl",
		Ogre::GPT_VERTEX_PROGRAM);
	vertexProgram->setParameter("entry_point", "main_vs");

	if(isUsingD3D9)
		vertexProgram->setParameter("target", "vs_3_0");
	else
		vertexProgram->setParameter("target", "vs_5_0");

	Ogre::String tmpSource =
R"(float4x4 modelViewProj;

struct VS_OUTPUT
{
	float3 oVertexPos : TEXCOORD0;
	float2 oUV : TEXCOORD1;
};

VS_OUTPUT main_vs(float4 position : POSITION,
	out float4 oPosition : POSITION,
	float2 iUV : TEXCOORD0)
{
	VS_OUTPUT Out;

oPosition = mul(modelViewProj, position);
Out.oVertexPos = position.xyz;
Out.oUV = iUV;

return Out;
})";
		vertexProgram->setSource(tmpSource);
		vertexProgram->load();

	/*

	vertex_program TestBug_VS hlsl
	{
		//source X_VS.hlsl
		entry_point main_vs
		target vs_5_0

		default_params
		{
			param_named_auto modelViewProj worldviewproj_matrix
		}
	}

	*/

	// Add default parameters to the vertex program
	auto vertexParams = vertexProgram->getDefaultParameters();
	vertexParams->setNamedAutoConstant("modelViewProj", Ogre::GpuProgramParameters::ACT_WORLDVIEWPROJ_MATRIX);




	// Create the fragment shader and its material
	auto fragmentProgram = gpuProgManager.createProgram(
		"TestBug_PS",
		Ogre::ResourceGroupManager::DEFAULT_RESOURCE_GROUP_NAME,
		"hlsl",
		Ogre::GPT_FRAGMENT_PROGRAM);
	fragmentProgram->setParameter("entry_point", "main_ps");

	if(isUsingD3D9)
		fragmentProgram->setParameter("target", "ps_3_0");
	else
		fragmentProgram->setParameter("target", "ps_5_0");

	tmpSource =
		R"(
struct LIGHT_OUTPUT
{
	float3 diffuse;
	float3 specular;
};

float computeAttenuation( float3 vertexposition, float3 lightposition,
				   float4 attenuation )
{
	float d = distance( vertexposition, lightposition );
	return 1/(attenuation.y + attenuation.z * d + attenuation.w * d * d);
}

float computeSpotlight( float3 LightToVertexNorm, float3 lightdirection, float3 spotlightparams )
{
	float cosSpotLightAngle = saturate(dot(LightToVertexNorm, lightdirection));
	float spotFactor = pow(saturate(cosSpotLightAngle - spotlightparams.y) / (spotlightparams.x - spotlightparams.y), spotlightparams.z);
	return spotFactor;
}

LIGHT_OUTPUT computeLighting( float3 lightposition, float3 P, float3 N, float3 V, float materialsh,
							 float4 lightattenuation,
							 float3 lightspecular, float3 lightdiffuse, float3 spotlightparams,
							 float3 lightdirection )
{
	float3 L = normalize( lightposition - P );
	float diffuseLight = max( dot( N, L ), 0);
	float3 H = normalize( L + V );
	float specularLight = pow(max( dot( N, H ), 0 ), materialsh);
	if( diffuseLight <= 0 ) specularLight = 0;

float AttenuationFactor = computeAttenuation( P, lightposition, lightattenuation );

float spotlight = 1.0f;
if( spotlightparams[ 0 ] != 1.0f )
	spotlight = computeSpotlight( normalize( (P - lightposition) ), lightdirection, spotlightparams );

LIGHT_OUTPUT Out;
Out.diffuse = diffuseLight * AttenuationFactor * spotlight * lightdiffuse;
Out.specular = specularLight * AttenuationFactor * spotlight * lightspecular;

return Out;
}


)";

#ifdef USE_LIGHT_ARRAYS
		tmpSource +=
			R"(
float4 lightDiffuse[20];
float4 lightSpecular[20];
float4 lightAttenuation[20];
float4 lightPosition[20];
float4 lightDirection[20];
float4 lightSpotParams[20];



)";
#else
		for (int i = 0; i < 20; i++)
		{
			std::string is = std::to_string(i);
			tmpSource += "float4 lightDiffuse" + is + ";\n";
		}
		for (int i = 0; i < 20; i++)
		{
			std::string is = std::to_string(i);
			tmpSource += "float4 lightSpecular" + is + ";\n";
		}
		for (int i = 0; i < 20; i++)
		{
			std::string is = std::to_string(i);
			tmpSource += "float4 lightAttenuation" + is + ";\n";
		}
		for (int i = 0; i < 20; i++)
		{
			std::string is = std::to_string(i);
			tmpSource += "float4 lightPosition" + is + ";\n";
		}
		for (int i = 0; i < 20; i++)
		{
			std::string is = std::to_string(i);
			tmpSource += "float4 lightDirection" + is + ";\n";
		}
		for (int i = 0; i < 20; i++)
		{
			std::string is = std::to_string(i);
			tmpSource += "float4 lightSpotParams" + is + ";\n";
		}
		tmpSource += "\n\n\n";
#endif

	if (isUsingD3D9)
	{
		tmpSource +=
R"(sampler DiffuseMap : register(s0);)";
		}
		else
		{
			tmpSource +=
R"(SamplerState DiffuseMap_state : register(s0);
Texture2D DiffuseMap : register(t0);)";
		}

	tmpSource +=
R"(
float4 main_ps(float3 position : TEXCOORD0,
			   float2 uv : TEXCOORD1 ) : COLOR0
{
	float4 color = float4(0,0,0,0);
)";

	if (isUsingD3D9)
	{
		tmpSource +=
R"(
	float3 DiffuseMapColour = tex2D(DiffuseMap, uv).xyz;
)";
		}
		else
		{
			tmpSource +=
R"(
	float3 DiffuseMapColour = DiffuseMap.Sample(DiffuseMap_state, uv).xyz;
)";
		}

	tmpSource +=
R"(
	color.xyz += DiffuseMapColour * 0.5;

color.w = 0.5;

float asdw = 0.0;

asdw += position.x * 0.00001;
asdw += position.y * 0.00001;
asdw += position.z * 0.00001;

const float multiplierAsd = 0.01;)";

	tmpSource += "\n";
	tmpSource += "\tfloat3 P = float3(1, 0, 0);\n";
	tmpSource += "\tfloat3 V = float3(0, 0, 1);\n";
	tmpSource += "\tfloat3 N = float3(0, 1, 0);\n";
	tmpSource += "\tfloat3 diffuse = float3(0, 0, 0);\n";
	tmpSource += "\tfloat3 specular = float3(0, 0, 0);\n";
	tmpSource += "\tfloat materialsh = 0.5;\n";
	tmpSource += "\n";

#ifdef USE_LIGHT_ARRAYS
		tmpSource += R"(

[unroll]
for(int i = 0; i < 20; i++)
{
	LIGHT_OUTPUT tmpLightOut = computeLighting(lightPosition[i].xyz, P, N, V, materialsh, lightAttenuation[i], lightSpecular[i].xyz, lightDiffuse[i].xyz, lightSpotParams[i].xyz, lightDirection[i].xyz);
	diffuse += tmpLightOut.diffuse;
	specular += tmpLightOut.specular;
}
)";
#else
		for (int i = 0; i <= 19; i++)
		{
			std::string is = std::to_string(i);
			tmpSource += "\n\t";
			if (i == 0)
				tmpSource += "LIGHT_OUTPUT ";
			tmpSource += "tmpLightOut = computeLighting(lightPosition" + is + ".xyz, P, N, V, materialsh, lightAttenuation" + is + ", lightSpecular" + is + ".xyz, lightDiffuse" + is + ".xyz, lightSpotParams" + is + ".xyz, lightDirection" + is + ".xyz);";
			tmpSource += "\n\t";
			tmpSource += "diffuse += tmpLightOut.diffuse;";
			tmpSource += "\n\t";
			tmpSource += "specular += tmpLightOut.specular;";
			tmpSource += "\n";
		}
#endif

	tmpSource += "\n";
	tmpSource += "\tdiffuse = saturate(diffuse);\n";
	tmpSource += "\tspecular = saturate(specular);\n";
	tmpSource += "\tasdw += diffuse.x * multiplierAsd;\n";
	tmpSource += "\tasdw += diffuse.y * multiplierAsd;\n";
	tmpSource += "\tasdw += diffuse.z * multiplierAsd;\n";
	tmpSource += "\tasdw += specular.x * multiplierAsd;\n";
	tmpSource += "\tasdw += specular.y * multiplierAsd;\n";
	tmpSource += "\tasdw += specular.z * multiplierAsd;\n";

#ifdef USE_LIGHT_ARRAYS
		tmpSource +=
	R"(
	color.x += asdw;

color.x *= 0.000001;
color.x += lightDiffuse[0].x;

return color;
})";
#else
		tmpSource +=
	R"(
	color.x += asdw;

color.x *= 0.000001;
color.x += lightDiffuse0.x;

return color;
})";
#endif

	fragmentProgram->setSource(tmpSource);
	fragmentProgram->load();

	// Add default parameters to the fragment program
	auto fragmentParams = fragmentProgram->getDefaultParameters();

#ifdef USE_LIGHT_ARRAYS
		/*

	fragment_program TestBug_PS hlsl
	{
		//source X_PS.hlsl
		entry_point main_ps
		target ps_5_0

		default_params
		{
			param_named_auto lightDiffuse light_diffuse_colour_array 20
			param_named_auto lightSpecular light_diffuse_colour_array 20
			param_named_auto lightAttenuation light_diffuse_colour_array 20
			param_named_auto lightPosition light_diffuse_colour_array 20
			param_named_auto lightDirection light_diffuse_colour_array 20
			param_named_auto lightSpotParams light_diffuse_colour_array 20
		}
	}

	*/

	fragmentParams->setNamedAutoConstant("lightDiffuse", GpuProgramParameters::ACT_LIGHT_DIFFUSE_COLOUR_ARRAY, 20); // Easiest to use diffuse on all of them to show the bug
	fragmentParams->setNamedAutoConstant("lightSpecular", GpuProgramParameters::ACT_LIGHT_DIFFUSE_COLOUR_ARRAY, 20);
	fragmentParams->setNamedAutoConstant("lightAttenuation", GpuProgramParameters::ACT_LIGHT_DIFFUSE_COLOUR_ARRAY, 20);
	fragmentParams->setNamedAutoConstant("lightPosition", GpuProgramParameters::ACT_LIGHT_DIFFUSE_COLOUR_ARRAY, 20);
	fragmentParams->setNamedAutoConstant("lightDirection", GpuProgramParameters::ACT_LIGHT_DIFFUSE_COLOUR_ARRAY, 20);
	fragmentParams->setNamedAutoConstant("lightSpotParams", GpuProgramParameters::ACT_LIGHT_DIFFUSE_COLOUR_ARRAY, 20);
	/*fragmentParams->setNamedAutoConstant("lightSpecular", GpuProgramParameters::ACT_LIGHT_SPECULAR_COLOUR_ARRAY, 20);
	fragmentParams->setNamedAutoConstant("lightAttenuation", GpuProgramParameters::ACT_LIGHT_ATTENUATION_ARRAY, 20);
	fragmentParams->setNamedAutoConstant("lightPosition", GpuProgramParameters::ACT_LIGHT_POSITION_ARRAY, 20);
	fragmentParams->setNamedAutoConstant("lightDirection", GpuProgramParameters::ACT_LIGHT_DIRECTION_ARRAY, 20);
	fragmentParams->setNamedAutoConstant("lightSpotParams", GpuProgramParameters::ACT_SPOTLIGHT_PARAMS_ARRAY, 20);*/
#else
		/*

	fragment_program TestBug_PS hlsl
	{
		//source X_PS.hlsl
		entry_point main_ps
		target ps_5_0

		default_params
		{
			param_named lightDiffuse0 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse1 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse2 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse3 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse4 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse5 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse6 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse7 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse8 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse9 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse10 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse11 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse12 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse13 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse14 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse15 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse16 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse17 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse18 float4 0.0 0.0 0.0 0.0
			param_named lightDiffuse19 float4 0.0 0.0 0.0 0.0

			param_named lightSpecular0 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular1 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular2 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular3 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular4 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular5 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular6 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular7 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular8 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular9 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular10 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular11 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular12 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular13 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular14 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular15 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular16 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular17 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular18 float4 0.0 0.0 0.0 0.0
			param_named lightSpecular19 float4 0.0 0.0 0.0 0.0

			param_named lightAttenuation0 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation1 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation2 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation3 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation4 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation5 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation6 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation7 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation8 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation9 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation10 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation11 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation12 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation13 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation14 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation15 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation16 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation17 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation18 float4 0.0 0.0 0.0 0.0
			param_named lightAttenuation19 float4 0.0 0.0 0.0 0.0

			param_named lightPosition0 float4 0.0 0.0 0.0 0.0
			param_named lightPosition1 float4 0.0 0.0 0.0 0.0
			param_named lightPosition2 float4 0.0 0.0 0.0 0.0
			param_named lightPosition3 float4 0.0 0.0 0.0 0.0
			param_named lightPosition4 float4 0.0 0.0 0.0 0.0
			param_named lightPosition5 float4 0.0 0.0 0.0 0.0
			param_named lightPosition6 float4 0.0 0.0 0.0 0.0
			param_named lightPosition7 float4 0.0 0.0 0.0 0.0
			param_named lightPosition8 float4 0.0 0.0 0.0 0.0
			param_named lightPosition9 float4 0.0 0.0 0.0 0.0
			param_named lightPosition10 float4 0.0 0.0 0.0 0.0
			param_named lightPosition11 float4 0.0 0.0 0.0 0.0
			param_named lightPosition12 float4 0.0 0.0 0.0 0.0
			param_named lightPosition13 float4 0.0 0.0 0.0 0.0
			param_named lightPosition14 float4 0.0 0.0 0.0 0.0
			param_named lightPosition15 float4 0.0 0.0 0.0 0.0
			param_named lightPosition16 float4 0.0 0.0 0.0 0.0
			param_named lightPosition17 float4 0.0 0.0 0.0 0.0
			param_named lightPosition18 float4 0.0 0.0 0.0 0.0
			param_named lightPosition19 float4 0.0 0.0 0.0 0.0

			param_named lightDirection0 float4 0.0 0.0 0.0 0.0
			param_named lightDirection1 float4 0.0 0.0 0.0 0.0
			param_named lightDirection2 float4 0.0 0.0 0.0 0.0
			param_named lightDirection3 float4 0.0 0.0 0.0 0.0
			param_named lightDirection4 float4 0.0 0.0 0.0 0.0
			param_named lightDirection5 float4 0.0 0.0 0.0 0.0
			param_named lightDirection6 float4 0.0 0.0 0.0 0.0
			param_named lightDirection7 float4 0.0 0.0 0.0 0.0
			param_named lightDirection8 float4 0.0 0.0 0.0 0.0
			param_named lightDirection9 float4 0.0 0.0 0.0 0.0
			param_named lightDirection10 float4 0.0 0.0 0.0 0.0
			param_named lightDirection11 float4 0.0 0.0 0.0 0.0
			param_named lightDirection12 float4 0.0 0.0 0.0 0.0
			param_named lightDirection13 float4 0.0 0.0 0.0 0.0
			param_named lightDirection14 float4 0.0 0.0 0.0 0.0
			param_named lightDirection15 float4 0.0 0.0 0.0 0.0
			param_named lightDirection16 float4 0.0 0.0 0.0 0.0
			param_named lightDirection17 float4 0.0 0.0 0.0 0.0
			param_named lightDirection18 float4 0.0 0.0 0.0 0.0
			param_named lightDirection19 float4 0.0 0.0 0.0 0.0

			param_named lightSpotParams0 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams1 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams2 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams3 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams4 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams5 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams6 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams7 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams8 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams9 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams10 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams11 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams12 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams13 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams14 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams15 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams16 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams17 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams18 float4 0.0 0.0 0.0 0.0
			param_named lightSpotParams19 float4 0.0 0.0 0.0 0.0
		}
	}

	*/

	for (int i = 0; i < 20; ++i)
	{
		std::string is = std::to_string(i);
		fragmentParams->setNamedConstant("lightDiffuse" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightSpecular" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightAttenuation" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightPosition" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightDirection" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
		fragmentParams->setNamedConstant("lightSpotParams" + is, Ogre::Vector4(0.0f, 0.0f, 0.0f, 0.0f));
	}
#endif



	// Create the material
	/*

	material TestBug
	{
		technique
		{
			pass
			{
				vertex_program_ref TestBug_VS
				{
				}

				fragment_program_ref TestBug_PS
				{
				}

				texture_unit DiffuseMap
				{
					texture nskingr.jpg // Ninja
				}
			}
		}
	}

	*/

	// Create a material and set its techniques and passes
	Ogre::MaterialManager& matManager = Ogre::MaterialManager::getSingleton();
	auto material = matManager.create(
		"TestBug",
		Ogre::ResourceGroupManager::DEFAULT_RESOURCE_GROUP_NAME);
	auto technique = material->getTechnique(0);
	auto pass = technique->getPass(0);

	// Set vertex and fragment programs to the pass
	pass->setVertexProgram(vertexProgram->getName());
	pass->setFragmentProgram(fragmentProgram->getName());
	pass->createTextureUnitState("nskingr.jpg");
	material->load();



	const float tmpSize = 70.0f;
	const float tmpStepSize = 2.0f;
	for (float x = -tmpSize; x < tmpSize; x += tmpStepSize)
	{
		for (float y = -tmpSize; y < tmpSize; y += tmpStepSize)
		{
			Entity* tmpEntity = mSceneMgr->createEntity("ogrehead.mesh");
			tmpEntity->setMaterialName("TestBug");
			SceneNode* tmpNode = mSceneMgr->getRootSceneNode()->createChildSceneNode();
			tmpNode->attachObject(tmpEntity);
			tmpNode->setPosition(Vector3(x, 0.0f, y));
			tmpNode->setScale(Vector3(0.1f, 0.02f, 0.1f));

			tmpEntities.push_back(tmpEntity);
			tmpSceneNodes.push_back(tmpNode);
		}
	}

	float tmpDistance = 2.0f;
	__SetPosition(mCamera, Vector3(tmpDistance, tmpDistance, tmpDistance));
	__LookAt(mCamera, Vector3::ZERO);
	mCamera->setNearClipDistance(0.01f);
	mCamera->setFarClipDistance(1000.0f);

	mCameraMan->setStyle(CS_MANUAL);

	MaterialManager::getSingleton().setDefaultTextureFiltering(Ogre::TFO_ANISOTROPIC);
	MaterialManager::getSingleton().setDefaultAnisotropy(16);

	mTrayMgr->showCursor();
}

void cleanupContent() override
{
	for (size_t i = 0; i < tmpEntities.size(); i++)
	{
		tmpEntities[i]->detachFromParent();
		mSceneMgr->destroyEntity(tmpEntities[i]);
	}
	tmpEntities.clear();

	for (size_t i = 0; i < tmpSceneNodes.size(); i++)
	{
		mSceneMgr->destroySceneNode(tmpSceneNodes[i]);
	}
	tmpSceneNodes.clear();
}
};

#endif

And here is a standalone compile that shows the problem if needed:
https://drive.google.com/file/d/1aWuqvJ ... sp=sharing

Same instructions as before ( shown in viewtopic.php?p=556971#p556971 ).

I tried this latest showcase version on more computers, and all of them have major issues.
Here are the results (1920x1080, Windowed, VSync off):

Format: Before bug FPS -> After bug FPS (by toggling numlock a couple of times)

NVIDIA GeForce GTX 1050 Ti (my computer):

D3D9: 735 FPS -> 101 FPS (86% slower)
D3D11: 190 FPS -> 104 FPS (45% slower)


NVIDIA GeForce RTX 3050 Laptop GPU (my laptop):

D3D9: 410 FPS -> 167 FPS (59% slower)
D3D11: 300 FPS -> 177 FPS (41% slower)


NVIDIA GeForce GTX 1650 (my mothers laptop):

D3D9: 230 FPS -> 133 FPS (42% slower)
D3D11: 245 FPS -> 140 FPS (43% slower)


NVIDIA GeForce RTX 4060 (my brothers computer):

D3D9: 820 FPS -> 380 FPS (53% slower)
D3D11: 670 FPS -> 400 FPS (40% slower)


Nvidia GeForce GTX 960 4GB Twin Frozr (my brothers computer #2):

D3D9: 645 FPS -> 100 FPS (84% slower)
D3D11: 175 FPS -> 102 FPS (42% slower)


A few additional notes:

  • It does not matter if you use arrays for the lights or not (enabled with USE_LIGHT_ARRAYS in the code and using right mouse button to create/destroy lights instead of using numlock).

  • It does not matter if you use real lights or not. The only thing that matters is that the shader parameters are changed.

  • The FPS decrease bug even happens for a very few amount of light parameters (say about 7 lights) on some computers.

  • The above tests were only done on computers that use Nvidia graphic cards. I could not find any non-Nvidia computers.

User avatar
sercero
Bronze Sponsor
Bronze Sponsor
Posts: 487
Joined: Sun Jan 18, 2015 4:20 pm
Location: Buenos Aires, Argentina
x 170

Re: Light Creation/Deletion Performance Bug

Post by sercero »

Hello @rpgplayerrobin,

Here is the OGRE config:
Image

This is how the CubeMapping demo looks (I'm using your latest package "LightBugCompletePackage3") with NumPad off (55 FPS)
Image

This is how it looks with NumPad on (55 FPS)
Image

Is that how the demo is supposed to look?

It seems that the FPS is not changing between the two modes.

What is strange is that in this latest version the FPS are lower in general.

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

The problem there is most likely more that the graphics card is an integrated one, which is has horrible performance either way.
Even a 10 year old computer from my tests had 175 FPS in that scene (and yours had 55 FPS), so the bug might not show itself at all on integrated graphics cards which are too slow.

Since @paroj has a NVIDIA RTX 4070, the new test I made might show the bug on that graphic card, as all Nvidia graphic cards has had the bug so far.

paroj
OGRE Team Member
OGRE Team Member
Posts: 2128
Joined: Sun Mar 30, 2014 2:51 pm
x 1141

Re: Light Creation/Deletion Performance Bug

Post by paroj »

I can indeed see a FPS change now:
D3D11: 1370FPS -> 1035FPS
D3D9: 1290FPS -> 1290FPS -> 1000FPS (only happens after 2nd toggle)

I assume this triggers some heuristic in the driver, where the UBO is migrated from GPU only to CPU visible memory for faster data updates.

You could try running the SampleBrowser with: https://developer.nvidia.com/nsight-graphics to get more insight.

Unfortunately I only have it on Linux right now as that is my main development environment.

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

I have already attempted to use Nsight, RenderDoc, PIX, Visual Studio Graphics Debugger and GPUView.

None of them except for GPUView actually shows any difference between the normal mode and when the bug has occurred.
Here is the GPUView screenshot of the issue, clearly showing that something is wrong:
Image

I have also tried a lot of things since my last post, but nothing has solved this bug yet.

Also, the problem is not just there with many objects in the scene, even with just 9x Ogre heads in the scene the bug shows itself very clearly, lowering the FPS from 1300 down to 150 on my computer.
So this has nothing to do with bandwidth or anything like that at least, it is just something very strange going on, for 6 different computers all using Nvidia for now at least.

paroj
OGRE Team Member
OGRE Team Member
Posts: 2128
Joined: Sun Mar 30, 2014 2:51 pm
x 1141

Re: Light Creation/Deletion Performance Bug

Post by paroj »

it looks like it still does the same amount of work, just slower. This supports my idea that HBU_GPU_ONLY is ignored and replaced by HBU_CPU_TO_GPU by the driver.

Could you force* the UBO updates to take this code-path and see whether it makes a difference?
https://github.com/OGRECave/ogre/blob/0 ... r.cpp#L151

*=e.g. force a shadow buffer on creation.

rpgplayerrobin
Orc Shaman
Posts: 710
Joined: Wed Mar 18, 2009 3:03 am
x 391

Re: Light Creation/Deletion Performance Bug

Post by rpgplayerrobin »

This supports my idea that HBU_GPU_ONLY is ignored and replaced by HBU_CPU_TO_GPU by the driver

But all constant buffers are created with HBU_CPU_TO_GPU in the code, when is HBU_GPU_ONLY supposed to be used? Because that is never actually used in updateDefaultUniformBuffer.
This is how it is created there:
createUniformBuffer(size, HBU_CPU_TO_GPU, false)

Could you force* the UBO updates to take this code-path and see whether it makes a difference?

I tried with many types of shadow buffers, but all of them have worse FPS per default than when the bug actually occurs, so they are only making things slower. But, as with my own staging buffer, the bug does disappear using this, but since it is per default slower than when the bug happens, it is not really useful.
I tried all of these codes when creating it:

  1. createUniformBuffer(size, HBU_CPU_TO_GPU, false)
    This is the normal code used, which has around 190 FPS, but the bug lowers it to 105 FPS.

  2. createUniformBuffer(size, HBU_GPU_ONLY, false)
    This does not create a shadow buffer on creation, but creates one when using writeData instead, which goes to your shown path in the code. It is not affected by the bug, but it always has 90 FPS, which makes it worse than #1 even after the bug has happened.

  3. createUniformBuffer(size, HBU_GPU_ONLY, true)
    This creates a shadow buffer on creation and then instead locks that later, not taking your shown path in the code.
    But it has the exact same result as #2.

  4. createUniformBuffer(size, HBU_CPU_TO_GPU, true)
    This has the exact same result as #2 and #3.

I also tried it with not discarding the data in writeData, but that made the FPS even worse.
I also tried it with HBU_CPU_ONLY, but that crashes.

A single frame of using a staging buffer, even on just a single object for one frame, fixes the bug. It is like it needs to "re-adjust" its timings or something like that and that the staging buffer forces the GPU to wait for a bit so that it aligns correctly again for the CPU->GPU sync points.
But it is impossible to tell when the bug has happened, and just doing it for one object per frame lowers the FPS by a lot (lower than when the bug is active if used once per frame for one object).