[2.2] Shader cache issue Topic is solved

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


Lax
Gnoll
Posts: 683
Joined: Mon Aug 06, 2007 12:53 pm
Location: Saarland, Germany
x 65

[2.2] Shader cache issue

Post by Lax »

Hi,

I have an issue with the shader caches. I use the code from the Ogre2.2 examples:

Code: Select all

if (this->root->getRenderSystem() && Ogre::GpuProgramManager::getSingletonPtr())
		{
			Ogre::HlmsManager* hlmsManager = this->root->getHlmsManager();
			Ogre::HlmsDiskCache diskCache(hlmsManager);

			Ogre::ArchiveManager& archiveManager = Ogre::ArchiveManager::getSingleton();

			Ogre::Archive* rwAccessFolderArchive = archiveManager.load(this->writeAccessFolder, "FileSystem", false);

			Ogre::TextureGpuManager* textureManager = Ogre::Root::getSingletonPtr()->getRenderSystem()->getTextureGpuManager();
			if (textureManager)
			{
				Ogre::String jsonString;
				textureManager->exportTextureMetadataCache(jsonString);
				const Ogre::String path = this->writeAccessFolder + "/NOWA_Engine.json";
				std::ofstream file(path.c_str(), std::ios::binary | std::ios::out);
				if (file.is_open())
					file.write(jsonString.c_str(), static_cast<std::streamsize>(jsonString.size()));
				file.close();
			}

			for (size_t i = Ogre::HLMS_LOW_LEVEL + 1u; i < Ogre::HLMS_MAX; ++i)
			{
				Ogre::Hlms* hlms = hlmsManager->getHlms(static_cast<Ogre::HlmsTypes>(i));
				if (hlms)
				{
					diskCache.copyFrom(hlms);

					Ogre::DataStreamPtr diskCacheFile = rwAccessFolderArchive->create("NOWA_Engine.bin");
					diskCache.saveTo(diskCacheFile);
				}
			}

			if (true == Ogre::GpuProgramManager::getSingleton().isCacheDirty())
			{
				try
				{
					// Save shader cache
					const Ogre::String filename = "NOWA_Engine.cache";
					Ogre::DataStreamPtr shaderCacheFile = rwAccessFolderArchive->create(filename);
					Ogre::GpuProgramManager::getSingleton().saveMicrocodeCache(shaderCacheFile);
				}
				catch (std::exception&)
				{
					Ogre::LogManager::getSingletonPtr()->logMessage(Ogre::LML_CRITICAL, "[Core]: Something went wrong during GPU shader cache saving.");
				}
			}

			archiveManager.unload(this->writeAccessFolder);
		}
But each time, I start the simulation, the graphics stall which make my game attempts unusable. See the video:
http://www.lukas-kalinowski.com/Homepag ... aching.mp4
So when I start the application the graphics do stall. When I restart the simulation within the application, it seems that the shaders are in cache and the simulation runs nice. But when I exit the application and check the "NOWA_Design.cache" file against a prior created "cache" will, nothing changed.
When I start the applictation again, the graphics will stall again.

I tried a lot of thinks, because I first thought, the issue is because of to big textures etc. I excluded a lot of thinks and what remains is the shader cache. I think it does not store shader correctly.

I hope somebody has an idea how to fix that issue, or what I could also try out...

I'm writing here in the forum, because I'm out of ideas, this issue does exist since I use shader caching, so its not a new issue in Ogre2.2 but also existed in Ogre 2.1.

Best Regards
Lax

http://www.lukas-kalinowski.com/Homepage/?page_id=1631
Please support Second Earth Technic Base built of Lego bricks for Lego ideas: https://ideas.lego.com/projects/81b9bd1 ... b97b79be62

rujialiu
Goblin
Posts: 296
Joined: Mon May 09, 2016 8:21 am
x 35

Re: [2.2] Shader cache issue

Post by rujialiu »

Lax wrote: Sat Oct 12, 2019 10:07 am I'm writing here in the forum, because I'm out of ideas, this issue does exist since I use shader caching, so its not a new issue in Ogre2.2 but also existed in Ogre 2.1.
To be honest, I suspect there is a bug in writing and/or reading shader cache... but I don't have a repro so I never asked here. It looks like your situation is much more reproducible, so I think you're in a good position to debug yourself. From my understanding, the cache is simply a "source code -> compiled microcode" map, so either Ogre produced a (possibly slightly) different source code when it should be identical, or there is a bug in updating the cache (maybe updated in memory but failed to write to disk?) OR reading the cache.

You can write a simple cache dump utility (dumping the key (i.e. source code) shuld be enough), and log the source BEFORE each D3DComple function call (I suppose you're using D3D11 RS). Here is a hypothsis debugging scenario:

1. the cache contains source code A, B, C before starting your game.
2. During the game, D3DCompile() is called for source code B, D, E, that means cache reading is buggy (B don't need to be compiled again)
3. After your game ends, now the cache contains source code A, B, C, D, that means cache writing is buggy because E is compiled but not written to disk.

Well, just know I realized that compiler flags should also be taken into account so you need to work with "source code + compiler flag" instead.

Good luck!
Lax
Gnoll
Posts: 683
Joined: Mon Aug 06, 2007 12:53 pm
Location: Saarland, Germany
x 65

Re: [2.2] Shader cache issue

Post by Lax »

Hi,

thanks for your response. I'm trying to debug everything, but its really hard to understand, what is going on.
I see each time when the stall begins, that this shader code is loaded:

Code: Select all


//#include "SyntaxHighlightingMisc.h"


#define ushort uint
#define ogre_float4x3 float4x3

//Short used for read operations. It's an int in GLSL & HLSL. An ushort in Metal
#define rshort2 int2
#define rint int
//Short used for write operations. It's an int in GLSL. An ushort in HLSL & Metal
#define wshort2 uint2
#define wshort3 uint3

#define toFloat3x3( x ) ((float3x3)(x))
#define buildFloat3x3( row0, row1, row2 ) transpose( float3x3( row0, row1, row2 ) )

#define min3( a, b, c ) min( a, min( b, c ) )
#define max3( a, b, c ) max( a, max( b, c ) )

#define INLINE
#define NO_INTERPOLATION_PREFIX nointerpolation
#define NO_INTERPOLATION_SUFFIX

#define finalDrawId input.drawId
#define PARAMS_ARG_DECL
#define PARAMS_ARG

#define floatBitsToUint(x) asuint(x)
#define uintBitsToFloat(x) asfloat(x)
#define floatBitsToInt(x) asint(x)
#define fract frac
#define lessThan( a, b ) (a < b)

#define inVs_vertexId input.vertexId
#define inVs_vertex input.vertex
#define inVs_blendWeights input.blendWeights
#define inVs_blendIndices input.blendIndices
#define inVs_qtangent input.qtangent

	#define inVs_drawId input.drawId


	#define inVs_uv0 input.uv0
	#define inVs_uv1 input.uv1
#define outVs_Position outVs.gl_Position
#define outVs_viewportIndex outVs.gl_ViewportIndex
#define outVs_clipDistance0 outVs.gl_ClipDistance0

#define gl_SampleMaskIn0 gl_SampleMask
#define interpolateAtSample( interp, subsample ) EvaluateAttributeAtSample( interp, subsample )
#define findLSB firstbitlow

#define outPs_colour0 outPs.colour0
#define OGRE_Sample( tex, sampler, uv ) tex.Sample( sampler, uv )
#define OGRE_SampleLevel( tex, sampler, uv, lod ) tex.SampleLevel( sampler, uv, lod )
#define OGRE_SampleArray2D( tex, sampler, uv, arrayIdx ) tex.Sample( sampler, float3( uv, arrayIdx ) )
#define OGRE_SampleArray2DLevel( tex, sampler, uv, arrayIdx, lod ) tex.SampleLevel( sampler, float3( uv, arrayIdx ), lod )
#define OGRE_SampleArrayCubeLevel( tex, sampler, uv, arrayIdx, lod ) tex.SampleLevel( sampler, float4( uv, arrayIdx ), lod )
#define OGRE_SampleGrad( tex, sampler, uv, ddx, ddy ) tex.SampleGrad( sampler, uv, ddx, ddy )
#define OGRE_SampleArray2DGrad( tex, sampler, uv, arrayIdx, ddx, ddy ) tex.SampleGrad( sampler, float3( uv, arrayIdx ), ddx, ddy )
#define OGRE_ddx( val ) ddx( val )
#define OGRE_ddy( val ) ddy( val )
#define OGRE_Load2D( tex, iuv, lod ) tex.Load( int3( iuv, lod ) )
#define OGRE_Load2DMS( tex, iuv, subsample ) tex.Load( iuv, subsample )

#define OGRE_Load3D( tex, iuv, lod ) tex.Load( int4( iuv, lod ) )

#define bufferFetch( buffer, idx ) buffer.Load( idx )
#define bufferFetch1( buffer, idx ) buffer.Load( idx ).x

#define structuredBufferFetch( buffer, idx ) buffer[idx]

#define OGRE_Texture3D_float4 Texture3D

#define OGRE_SAMPLER_ARG_DECL( samplerName ) , SamplerState samplerName
#define OGRE_SAMPLER_ARG( samplerName ) , samplerName

#define CONST_BUFFER( bufferName, bindingPoint ) cbuffer bufferName : register(b##bindingPoint)
#define CONST_BUFFER_STRUCT_BEGIN( structName, bindingPoint ) cbuffer structName : register(b##bindingPoint) { struct _##structName
#define CONST_BUFFER_STRUCT_END( variableName ) variableName; }

#define FLAT_INTERPOLANT( decl, bindingPoint ) nointerpolation decl : TEXCOORD##bindingPoint
#define INTERPOLANT( decl, bindingPoint ) decl : TEXCOORD##bindingPoint

#define OGRE_OUT_REF( declType, variableName ) out declType variableName
#define OGRE_INOUT_REF( declType, variableName ) inout declType variableName

#define OGRE_ARRAY_START( type ) {
#define OGRE_ARRAY_END }



	
		#define worldViewMat passBuf.view
	

	
float4x4 UNPACK_MAT4( Buffer<float4> matrixBuf, uint pixelIdx )
{
	float4 row1 = matrixBuf.Load( int((pixelIdx) << 2u) );
	float4 row2 = matrixBuf.Load( int(((pixelIdx) << 2u) + 1u) );
	float4 row3 = matrixBuf.Load( int(((pixelIdx) << 2u) + 2u) );
	float4 row4 = matrixBuf.Load( int(((pixelIdx) << 2u) + 3u) );

	return transpose( float4x4( row1, row2, row3, row4 ) );
}

	
float4x3 UNPACK_MAT4x3( Buffer<float4> matrixBuf, uint pixelIdx )
{
	float4 row1 = matrixBuf.Load( int((pixelIdx) << 2u) );
	float4 row2 = matrixBuf.Load( int(((pixelIdx) << 2u) + 1u) );
	float4 row3 = matrixBuf.Load( int(((pixelIdx) << 2u) + 2u) );

	return transpose( float3x4( row1, row2, row3 ) );
}


	// START UNIFORM DECLARATION
	
struct ShadowReceiverData
{
	float4x4 texViewProj;
	float2 shadowDepthRange;
	float2 padding;
	float4 invShadowMapSize;
};

struct Light
{
			float3 position;
		uint lightMask;
		float4 diffuse;		//.w contains numNonCasterDirectionalLights
	float3 specular;
};

#define numNonCasterDirectionalLights lights[0].diffuse.w

#define areaLightDiffuseMipmapStart areaApproxLights[0].diffuse.w
#define areaLightNumMipmapsSpecFactor areaApproxLights[0].specular.w

#define numAreaApproxLights areaApproxLights[0].doubleSided.y
#define numAreaApproxLightsWithMask areaApproxLights[0].doubleSided.z

#define numAreaLtcLights areaLtcLights[0].points[0].w
#define numAreaLtcLights areaLtcLights[0].points[0].w

struct AreaLight
{
			float3 position;
		uint lightMask;
		float4 diffuse;		//[0].w contains diffuse mipmap start
	float4 specular;	//[0].w contains mipmap scale
	float4 attenuation;	//.w contains texture array idx
	//Custom 2D Shape:
	//  direction.xyz direction
	//  direction.w invHalfRectSize.x
	//  tangent.xyz tangent
	//  tangent.w invHalfRectSize.y
	float4 direction;
	float4 tangent;
	float4 doubleSided;	//.y contains numAreaApproxLights
						//.z contains numAreaApproxLightsWithMask
	};

struct AreaLtcLight
{
			float3 position;
		uint lightMask;
		float4 diffuse;			//.w contains attenuation range
	float4 specular;		//.w contains doubleSided
	float4 points[4];		//.w contains numAreaLtcLights
							//points[1].w, points[2].w, points[3].w contain obbFadeFactorLtc.xyz
	};




//Uniforms that change per pass
CONST_BUFFER_STRUCT_BEGIN( PassBuffer, 0 )
{
	//Vertex shader (common to both receiver and casters)

	float4x4 viewProj;




	//Vertex shader
	float4x4 view;
	
	
	//-------------------------------------------------------------------------

	//Pixel shader
	float3x3 invViewMatCubemap;

	float padding; //Compatibility with GLSL

	float4 pccVctMinDistance_invPccVctInvDistance_rightEyePixelStartX_unused;



	float4 ambientUpperHemi;

	float4 ambientLowerHemi;
	float4 ambientHemisphereDir;


	Light lights[2];		

	//Forward3D
	//f3dData.x = minDistance;
	//f3dData.y = invMaxDistance;
	//f3dData.z = f3dNumSlicesSub1;
	//f3dData.w = uint cellsPerTableOnGrid0 (floatBitsToUint);

	//Clustered Forward:
	//f3dData.x = minDistance;
	//f3dData.y = invExponentK;
	//f3dData.z = f3dNumSlicesSub1;
	//f3dData.w = renderWindow->getHeight();
	float4 f3dData;
		
		float4 fwdScreenToGrid;
	
	


	

	

#define pccVctMinDistance		pccVctMinDistance_invPccVctInvDistance_rightEyePixelStartX_unused.x
#define invPccVctInvDistance	pccVctMinDistance_invPccVctInvDistance_rightEyePixelStartX_unused.y
#define rightEyePixelStartX		pccVctMinDistance_invPccVctInvDistance_rightEyePixelStartX_unused.z
}
CONST_BUFFER_STRUCT_END( passBuf );

	
		//Uniforms that change per Item/Entity
		CONST_BUFFER( InstanceBuffer, 2 )
		{
			//.x =
			//The lower 9 bits contain the material's start index.
			//The higher 23 bits contain the world matrix start index.
			//
			//.y =
			//shadowConstantBias. Send the bias directly to avoid an
			//unnecessary indirection during the shadow mapping pass.
			//Must be loaded with uintBitsToFloat
			//
			//.z =
			//lightMask. Ogre must have been compiled with OGRE_NO_FINE_LIGHT_MASK_GRANULARITY
			
				uint4 worldMaterialIdx[2];
					};
	
	
	// END UNIFORM DECLARATION

	



struct VS_INPUT
{
	float4 vertex : POSITION;
	float3 normal : NORMAL;



	float3 tangent	: TANGENT;
	



	uint4 blendIndices	: BLENDINDICES;
	float4 blendWeights : BLENDWEIGHT;



	float2 uv0 : TEXCOORD0;
	float4 uv1 : TEXCOORD1;
	uint drawId : DRAWID;
	
};

struct PS_INPUT
{
	
	
					
				FLAT_INTERPOLANT( ushort drawId, 0 );
					
		
			INTERPOLANT( float3 pos, 1 );
			INTERPOLANT( float3 normal, 2 );
			
				INTERPOLANT( float3 tangent, 3 );
											
			INTERPOLANT( float2 uv0, 4 );
			INTERPOLANT( float4 uv1, 5 );
						
				

	float4 gl_Position: SV_Position;

	

	
	
	
	
};

// START UNIFORM D3D DECLARATION
Buffer<float4> worldMatBuf : register(t0);
// END UNIFORM D3D DECLARATION

PS_INPUT main( VS_INPUT input )
{
	PS_INPUT outVs;

	float3 normal	= input.normal;
	float3 tangent	= input.tangent;
	


	
	
	

	

	
	uint _idx = (inVs_blendIndices[0] << 1u) + inVs_blendIndices[0]; //inVs_blendIndices[0] * 3u; a 32-bit int multiply is 4 cycles on GCN! (and mul24 is not exposed to GLSL...)
	uint matStart = worldMaterialIdx[inVs_drawId].x >> 9u;
	float4 worldMat[3];
	worldMat[0] = bufferFetch( worldMatBuf, int(matStart + _idx + 0u) );
	worldMat[1] = bufferFetch( worldMatBuf, int(matStart + _idx + 1u) );
	worldMat[2] = bufferFetch( worldMatBuf, int(matStart + _idx + 2u) );
	float4 worldPos;
	worldPos.x = dot( worldMat[0], inVs_vertex );
	worldPos.y = dot( worldMat[1], inVs_vertex );
	worldPos.z = dot( worldMat[2], inVs_vertex );
	worldPos.xyz *= inVs_blendWeights[0];
	
		float3 worldNorm;
		worldNorm.x = dot( worldMat[0].xyz, normal );
		worldNorm.y = dot( worldMat[1].xyz, normal );
		worldNorm.z = dot( worldMat[2].xyz, normal );
		worldNorm *= inVs_blendWeights[0];
	
	
		float3 worldTang;
		worldTang.x = dot( worldMat[0].xyz, tangent );
		worldTang.y = dot( worldMat[1].xyz, tangent );
		worldTang.z = dot( worldMat[2].xyz, tangent );
		worldTang *= inVs_blendWeights[0];
	

	
	
		float4 tmp;
		tmp.w = 1.0;
	//!NeedsMoreThan1BonePerVertex
	
		_idx = (inVs_blendIndices[1] << 1u) + inVs_blendIndices[1]; //inVs_blendIndices[1] * 3; a 32-bit int multiply is 4 cycles on GCN! (and mul24 is not exposed to GLSL...)
		worldMat[0] = bufferFetch( worldMatBuf, int(matStart + _idx + 0u) );
		worldMat[1] = bufferFetch( worldMatBuf, int(matStart + _idx + 1u) );
		worldMat[2] = bufferFetch( worldMatBuf, int(matStart + _idx + 2u) );
		tmp.x = dot( worldMat[0], inVs_vertex );
		tmp.y = dot( worldMat[1], inVs_vertex );
		tmp.z = dot( worldMat[2], inVs_vertex );
		worldPos.xyz += (tmp * inVs_blendWeights[1]).xyz;
		
			tmp.x = dot( worldMat[0].xyz, normal );
			tmp.y = dot( worldMat[1].xyz, normal );
			tmp.z = dot( worldMat[2].xyz, normal );
			worldNorm += tmp.xyz * inVs_blendWeights[1];
		
		
			tmp.x = dot( worldMat[0].xyz, tangent );
			tmp.y = dot( worldMat[1].xyz, tangent );
			tmp.z = dot( worldMat[2].xyz, tangent );
			worldTang += tmp.xyz * inVs_blendWeights[1];
		
	

	worldPos.w = 1.0;

	
	
	//Lighting is in view space
		outVs.pos		= mul( worldPos, worldViewMat ).xyz;
		outVs.normal	= mul( worldNorm, toFloat3x3( worldViewMat ) );
							outVs.tangent	= mul( worldTang, toFloat3x3( worldViewMat ) );
	
        
			
				outVs_Position = mul( worldPos, passBuf.viewProj );
			
		
	


	
		
		
			
	

	/// hlms_uv_count will be 0 on shadow caster passes w/out alpha test
	
		outVs.uv0 = inVs_uv0;
		outVs.uv1 = inVs_uv1;


	
		outVs.drawId = inVs_drawId;
	


	

	

	

	

	return outVs;
}
_vs_5_0_
But each time in OgreHlms in function "getMaterial" for:

Code: Select all

lastReturnedValue = this->getShaderCache( finalHash );

the lastReturnedValue is null, so it will be compiled. Then in OgreD3D11HLSLProgram.cpp in function:

Code: Select all

void D3D11HLSLProgram::loadFromSource(void)
    {
        if ( GpuProgramManager::getSingleton().isMicrocodeAvailableInCache(getNameForMicrocodeCache()) )
        {
-->            getMicrocodeFromCache();
        }
        else
        {
            compileMicrocode();
        }
    }
Always "getMicrocodeFromCache();" is called, because the cache does exist in the hashMap and never "compileMicrocode();", so its never added to the cache and isCacheDirty is never set to "true". But why?

What is the criteria that, something will be added to cache?
Is maybe the hash wrong calculated, or is the hash by accident the same as another one?

http://www.lukas-kalinowski.com/Homepage/?page_id=1631
Please support Second Earth Technic Base built of Lego bricks for Lego ideas: https://ideas.lego.com/projects/81b9bd1 ... b97b79be62

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5511
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1379

Re: [2.2] Shader cache issue

Post by dark_sylinc »

I took a look at your 1st & 2nd post.

What you saw is the microcode cache working (otherwise you could never land inside getMicrocodeFromCache).

However your Hlms Disk Cache is not working.

After a careful look, you're saving/loading the hlms disk cache incorrectly. You're doing:

Code: Select all

rwAccessFolderArchive->create("NOWA_Engine.bin");
However you should be doing:

Code: Select all

Ogre::String filename = "NOWA_Engine" + Ogre::StringConverter::toString( i ) + ".bin";
rwAccessFolderArchive->create( filename );
Because there's one cache per active Hlms and you're overriding all of them with the last one. Your NOWA_Engine.bin is 1kb, however it should be much bigger (and you should have multiple of them, at least two maybe more).
Lax
Gnoll
Posts: 683
Joined: Mon Aug 06, 2007 12:53 pm
Location: Saarland, Germany
x 65

Re: [2.2] Shader cache issue

Post by Lax »

Hi dak_sylinc,

YES, that was exactly the issue. Due to careless/brainless code copy by my side, I did not see that this code does not make any sense.

Thank you so much for tracking down the issue!

Best Regards
Lax

http://www.lukas-kalinowski.com/Homepage/?page_id=1631
Please support Second Earth Technic Base built of Lego bricks for Lego ideas: https://ideas.lego.com/projects/81b9bd1 ... b97b79be62