Reading from HardwareBuffers

Problems building or running the engine, queries about how to use features etc.
Post Reply
chilly willy
Halfling
Posts: 65
Joined: Tue Jun 02, 2020 4:11 am
x 19

Reading from HardwareBuffers

Post by chilly willy »

I'm confused on reading from HardwareBuffers.

In RetrieveVertexData and Raycasting to the polygon level, getMeshInformation() reads vertices and indices from HardwareBuffers by calling HardwareBuffer::lock(HBL_READ_ONLY).

For HBL_READ_ONLY the docs say

Not allowed in buffers which are created with HBU_GPU_ONLY.

By default, a Mesh's HardwareBuffers are created with HBU_GPU_ONLY and no shadow buffers. Is it assumed that to use the getMeshInformation() code, buffers would have to be specified as a different usage or use shadow buffers?

In this post the buffers are explicitly created with HBU_STATIC_WRITE_ONLY (same as HBU_GPU_ONLY) and no shadow buffer. Then getMeshInformation() is called which calls lock(HBL_READ_ONLY) and it worked.

Does HBL_READ_ONLY actually work with HBU_GPU_ONLY? Is this old code that would no longer work? Would creating a shadow buffer make it work with HBU_GPU_ONLY or would we need a different HardwareBufferUsage?

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

Here is a post of mine regarding this (but with textures instead):
viewtopic.php?f=2&t=96608

As I understand it, if it is with GPU_ONLY, it gets copied to the CPU memory if needed.

When I in my application (Ogre 1.12.13) debug when reading a mesh in the same way that you want to do, I can see that the mUsage is 5, which is HBU_STATIC_WRITE_ONLY (HBU_STATIC_WRITE_ONLY = HBU_GPU_ONLY).
Since my code works for this in all cases (at least for Direct3D9 and Direct3D11), it must work.

However, note that this usage variable is only for Ogre.
It does not translate that well to the actual rendersystem.

When I debug a bit I can see that when the mesh gets created with mUsage of 5 (which is GPU_ONLY), the D3D9 function D3D9HardwareBuffer::D3D9HardwareBuffer does not really use the mUsage, it only really checks if it should discard it or not, which in this case it will not.

Same kind of thing happens in D3D9HardwareBuffer::_lockBuffer, it never uses the mUsage there either, since the function D3D9Mappings::get always then returns 0.

Then again, in D3D9HardwareBuffer::createBuffer, D3D9Mappings::get returns 8 with an mUsage of 5, which means it is write only (D3DUSAGE_WRITEONLY), and completely ignores the GPU_ONLY flag.

So in short, I think the mUsage of 5 actually gets pretty much ignored, at least in D3D9 when I debug. It instead gets translated to write only (D3DUSAGE_WRITEONLY).

When locking it (in D3D9HardwareBuffer::_lockBuffer), the vertex buffer might automatically get copied with the function IDirect3DVertexBuffer9::lock, since D3DUSAGE_WRITEONLY was used on meshes.
So that would probably be my answer to this, that it automatically gets copied when locking it, which is done by the specific render system (like D3D9.

@paroj could probably answer this much better though.

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Reading from HardwareBuffers

Post by paroj »

the docs are correct, but sometimes it will still work. E.g. on D3D11 there is code to explicitly handle the on-demand copy to CPU:
https://github.com/OGRECave/ogre/blob/4 ... #L144-L161

D3D9 and GL would generally handle this inside the Driver. On Vulkan this is not implemented, as it would only work before you start rendering a frame. Therefore, if you aim for portable code, you should not rely on this, but rather add shadow buffers when you intend to access the buffers on the CPU like this:
https://ogrecave.github.io/ogre/api/lat ... d17ef60daa

chilly willy
Halfling
Posts: 65
Joined: Tue Jun 02, 2020 4:11 am
x 19

Re: Reading from HardwareBuffers

Post by chilly willy »

Thank you, rpgplayerrobin and paroj.

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

I googled around a bit and I saw multiple articles for D3D9 that talks exactly about using D3DUSAGE_WRITEONLY together with locking it with read only.
All of them say that the behaviour then is undocumented and not recommended, as the memory there might be invalid.
However, some users even noted that they have never seen it become invalid, but that it should still be avoided since on some hardware it might happen.
So this is an issue that needs to be solved by users in that case.

Lets say that you want to pick against all objects in your scene, does that mean that setVertexBufferPolicy/setIndexBufferPolicy must be set to use shadow buffers for all meshes before they are loaded in?

When I do this, the meshes that gets loaded are sometimes actually getting edited with their mUsage variable (for Direct3D9).

This is because when reading a mesh for Direct3D9, the function MeshSerializerImpl::readGeometryVertexBuffer calls D3D9HardwareBufferManager::createVertexBuffer, which disables shadow buffers and overwrites the usage to 1 (HBU_STATIC/HBU_GPU_TO_CPU).

In Direct3D11 for createVertexBuffer, it instead keeps its usage of 5 (HBU_STATIC_WRITE_ONLY/HBU_GPU_ONLY) and gets a real shadow buffer.

I also verified this by going into the lock function and placing a breakpoint when trying to read from the shadow buffer, which is only ever does for Direct3D11.

I guess this is intended though, but it also means that in Direct3D9, meshes are not as optimized for rendering as they were before with 5 (HBU_STATIC_WRITE_ONLY/HBU_GPU_ONLY)?

Is there some way of cloning a mesh and temporarily just reading the vertex/index data instead of always having the mesh using a shadow buffer and also avoiding a usage of 1 (HBU_STATIC/HBU_GPU_TO_CPU) for Direct3D9?
That way, you could load in the data into memory just once and then delete the temporary mesh, without having all meshes possibly get lower performance because of the setVertexBufferPolicy/setIndexBufferPolicy switch.

Trying to do that with MeshPtr::clone seems to be impossible at least.
Setting setVertexBufferPolicy and setIndexBufferPolicy and then using MeshPtr::reload makes the mesh just lose all its data, so that option is not viable.

Anyway, my way to solve all these things are below.
However, I do not know if there might be a better way.

The only way I could clone a mesh with the right policies was by loading the mesh in again from scratch into a temporary mesh, and then deleting that mesh when the information from it was read, using this function:

Code: Select all

MeshPtr CMeshToMesh::ConvertToMesh(CString filePath, CString nameAddition)
{
	// Setup the mesh to return
	MeshPtr tmpMeshPtr;
	tmpMeshPtr.reset();

// Setup the file as a mesh
FILE* pFile = NULL;
fopen_s(&pFile, filePath.ToCharArray(), "rb");
if(pFile)
{
	struct stat tagStat;
	stat(filePath.ToCharArray(), &tagStat);
	MemoryDataStream* tmpMemoryDataStream = new MemoryDataStream(filePath, tagStat.st_size, true);
	fread((void*)tmpMemoryDataStream->getPtr(), tagStat.st_size, 1, pFile);
	fclose( pFile );

	CString tmpFileName = CGeneric::GetFileName(filePath);
	tmpFileName.Replace(".mesh", "_ConvertToMesh_" + nameAddition + CGeneric::GenerateUniqueName() + ".mesh");
	MeshPtr tmpNewMesh = MeshManager::getSingleton().createManual(tmpFileName, ResourceGroupManager::DEFAULT_RESOURCE_GROUP_NAME);
	CGeneric::HandleMeshLoad(tmpNewMesh);

	try
	{
		DataStreamPtr tmpDataStream(tmpMemoryDataStream);
		MeshSerializer tmpMeshSerializer;
		tmpMeshSerializer.importMesh(tmpDataStream, tmpNewMesh.get());
	}
	catch(...)
	{
		CGeneric::Destroy(tmpNewMesh);
	}

	tmpMeshPtr = tmpNewMesh;
}

// Return the mesh
return tmpMeshPtr;
}

If you need to get the file path of only a file name (for example, meshPtr->getName()), then this function can be used to find the actual file path:

Code: Select all

// We need this in order to find the file path of a specific file name
CString CGeneric::GetResourceFilePathFromFileName(const CString& fileName)
{
	// Loop through all resource groups
	StringVector tmpStringVector = ResourceGroupManager::getSingleton().getResourceGroups();
	for (int i = 0; i < (int)tmpStringVector.size(); i++)
	{
		// Check if the texture exists in our current resource group
		const String& tmpResourceGroupName = tmpStringVector[i];
		ResourceGroupManager::LocationList tmpLocationList = Ogre::ResourceGroupManager::getSingleton().getResourceLocationList(tmpResourceGroupName);
		ResourceGroupManager::LocationList::iterator tmpItr = tmpLocationList.begin();
		ResourceGroupManager::LocationList::iterator tmpEnd = tmpLocationList.end();
		for (; tmpItr != tmpEnd; ++tmpItr)
		{
			CString tmpDirectory = tmpItr->archive->getName();
			StringVectorPtr tmpList = tmpItr->archive->list();
			for (int n = 0; n < tmpList->size(); n++)
			{
				CString tmpFileName = (*tmpList)[n];
				if (tmpFileName == fileName)
					return tmpDirectory + "/" + tmpFileName;
			}
		}
	}

// Return that it was not found
return "";
}

Also, there are sometimes meshes that always want to use their information on runtime, and by using a listener on the resource group being loaded that can be done before loading them (ResourceGroupManager::addResourceGroupListener):

Code: Select all

void CResourceGroupListener::resourceCreated(const ResourcePtr& resource)
{
	// Get the resource type that is about to be loaded
	CString tmpResourceType = resource->getCreator()->getResourceType();

// Check if the resource about to be loaded is a mesh
if (tmpResourceType == "Mesh")
	// Handle mesh load
	CGeneric::HandleMeshLoad(resource);
}

CGeneric::HandleMeshLoad is here:

Code: Select all

// Handles a mesh being loaded
// This is needed because: https://forums.ogre3d.org/viewtopic.php?p=554664#p554664
void CGeneric::HandleMeshLoad(const Ogre::ResourcePtr& resource)
{
	// Check if the mesh needs to be read on runtime by the CPU (also check CGeneric::Lock to validate that it works)
	// https://forums.ogre3d.org/viewtopic.php?p=554664#p554664
	CString tmpName = resource->getName();
	if (tmpName.Contains("_col_") || // For example: barrel0_col_convex.mesh
	    tmpName.Contains("ReadMesh")) // For all meshes created with the single purpose of being able to read a mesh
	{
		// Alter the mesh buffer policy to allow us to read it on runtime by the CPU
		MeshPtr tmpMeshPtr = Ogre::static_pointer_cast<Mesh>(resource);
		int tmpVertexBufferUsage = tmpMeshPtr->getVertexBufferUsage();
		int tmpIndexBufferUsage = tmpMeshPtr->getIndexBufferUsage();
		tmpMeshPtr->setVertexBufferPolicy(tmpVertexBufferUsage, true);
		tmpMeshPtr->setIndexBufferPolicy(tmpIndexBufferUsage, true);
	}
}

Then, to validate if everything works correctly, these functions must be used instead of when locking:

Code: Select all

void LockHandleError(HardwareBuffer::LockOptions options, int usage, bool hasShadowBuffer)
{
	// Check for possible errors (reading from a HBU_STATIC_WRITE_ONLY/HBU_GPU_ONLY is apparently bad without using a shadow buffer: https://forums.ogre3d.org/viewtopic.php?p=554664#p554664)
	if (options == HardwareBuffer::LockOptions::HBL_READ_ONLY ||
		options == HardwareBuffer::LockOptions::HBL_NORMAL)
	{
		// For Direct3D9, the mesh attempting to get a shadow buffer instead does not get a shadow buffer and just gets assigned a usage of HBU_STATIC/HBU_GPU_TO_CPU instead.
		// For Direct3D11, the shadow buffer actually gets created and the usage stays the same.

	// For Direct3D9, if the usage is not HBU_STATIC/HBU_GPU_TO_CPU, it means that the mesh has not been fixed in CResourceGroupListener::resourceLoadStarted.
	if (usage != HardwareBuffer::HBU_STATIC /* HBU_GPU_TO_CPU */)
	{
		// In Direct3D9, the shadow buffer will always be empty, but in Direct3D11 it will have a shadow buffer if it was changed with CGeneric::HandleMeshLoad.
		if (!hasShadowBuffer)
		{
			// Write the message into the log
			CString tmpMessage = "Warning: LockHandleError, no shadow buffer found, add it with CGeneric::HandleMeshLoad";
			LogManager::getSingleton().getDefaultLog()->logMessage(tmpMessage);
		}
	}
}
}

// Locks a buffer and checks for errors
void* CGeneric::Lock(HardwareVertexBufferSharedPtr ptr, HardwareBuffer::LockOptions options)
{
	LockHandleError(options, ptr->getUsage(), ptr->hasShadowBuffer());
	return ptr->lock(options);
}

// Locks a buffer and checks for errors
void* CGeneric::Lock(HardwareIndexBufferSharedPtr ptr, HardwareBuffer::LockOptions options)
{
	LockHandleError(options, ptr->getUsage(), ptr->hasShadowBuffer());
	return ptr->lock(options);
}

So for example, if the code before was:

Code: Select all

unsigned char* vertex = static_cast<unsigned char*>(vbuf->lock(Ogre::HardwareBuffer::HBL_READ_ONLY));

It should now instead be:

Code: Select all

unsigned char* vertex = static_cast<unsigned char*>(CGeneric::Lock(vbuf, Ogre::HardwareBuffer::HBL_READ_ONLY));

That way, if there is somewhere in your application that still has an invalid lock, the function will tell you with a warning in the log.
You can then put a breakpoint there instead and see exactly which mesh caused the issue, which you can then easily fix either by adding the mesh name to CGeneric::HandleMeshLoad or by cloning the mesh with this before reading/locking it:

Code: Select all

MeshPtr tmpClonedMesh = CMeshToMesh::ConvertToMesh(CGeneric::GetResourceFilePathFromFileName(meshPtr->getName()), "ReadMesh");
paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Reading from HardwareBuffers

Post by paroj »

This is because when reading a mesh for Direct3D9, the function MeshSerializerImpl::readGeometryVertexBuffer calls D3D9HardwareBufferManager::createVertexBuffer, which disables shadow buffers and overwrites the usage to 1 (HBU_STATIC/HBU_GPU_TO_CPU).

not in upstream code, usage and shadowBuffer settings are correctly forwarded:
https://github.com/OGRECave/ogre/blob/v ... pp#L50-L64

Is there some way of cloning a mesh and temporarily just reading the vertex/index data instead of always having the mesh using a shadow buffer and also avoiding a usage of 1 (HBU_STATIC/HBU_GPU_TO_CPU) for Direct3D9?

create a HBU_CPU_ONLY buffer of the same size as the buffer you want to read and copy to it. See the D3D11 code referenced above.

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

not in upstream code, usage and shadowBuffer settings are correctly forwarded:
https://github.com/OGRECave/ogre/blob/v ... pp#L50-L64

Does upstream code mean a later version of the code?
It then sounds weird since I use 1.12.13 exactly like the linked code also is from?

Because my code looks like this, which skips shadow buffers completely:

Code: Select all

HardwareVertexBufferSharedPtr 
    D3D9HardwareBufferManager::
    createVertexBuffer(size_t vertexSize, size_t numVerts, HardwareBuffer::Usage usage,
        bool useShadowBuffer)
    {
        assert (numVerts > 0);

#if OGRE_D3D_MANAGE_BUFFERS
		// Override shadow buffer setting; managed buffers are automatically
		// backed by system memory
		// Don't override shadow buffer if discardable, since then we use
		// unmanaged buffers for speed (avoids write-through overhead)
		// Don't override if we use directX9EX, since then we don't have managed
		// pool. And creating non-write only default pool causes a performance warning. 
		if (useShadowBuffer && !(usage & HardwareBuffer::HBU_DISCARDABLE) &&
			!D3D9RenderSystem::isDirectX9Ex())
		{
			useShadowBuffer = false;
			// Also drop any WRITE_ONLY so we can read direct
			if (usage == HardwareBuffer::HBU_DYNAMIC_WRITE_ONLY)
			{
				usage = HardwareBuffer::HBU_DYNAMIC;
			}
			else if (usage == HardwareBuffer::HBU_STATIC_WRITE_ONLY)
			{
				usage = HardwareBuffer::HBU_STATIC;
			}
		}
		//If we have write only buffers in DirectX9Ex we will turn on the discardable flag.
		//Otherwise Ogre will operates in far less framerate
		if (D3D9RenderSystem::isDirectX9Ex() && (usage & HardwareBuffer::HBU_WRITE_ONLY))
		{
			usage = (HardwareBuffer::Usage)
				((unsigned int)usage | (unsigned int)HardwareBuffer::HBU_DISCARDABLE);
		}
#endif

    auto impl = new D3D9HardwareBuffer(D3DFMT_VERTEXDATA, vertexSize * numVerts, usage, useShadowBuffer);
    auto buf = std::make_shared<HardwareVertexBuffer>(this, vertexSize, numVerts, impl);
    {
        OGRE_LOCK_MUTEX(mVertexBuffersMutex);
        mVertexBuffers.insert(buf.get());
    }
    return buf;
}

create a HBU_CPU_ONLY buffer of the same size as the buffer you want to read and copy to it. See the D3D11 code referenced above.

I don't understand this. Can you give an example on how to do this from an existing mesh without a shadow buffer and a usage of 5?
I checked the https://github.com/OGRECave/ogre/blob/4 ... #L144-L161 but I don't see anything I can use.

Cloning/reading a mesh with a shadow buffer is not hard, but doing it from a mesh that has a usage of 5 and not have a shadow buffer is the challenge here.

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Reading from HardwareBuffers

Post by paroj »

Does upstream code mean a later version of the code?
It then sounds weird since I use 1.12.13 exactly like the linked code also is from?

Because my code looks like this, which skips shadow buffers completely:

no, I was referring to the v1.12.13 tag I linked as upstream.
I guess you added the OGRE_D3D_MANAGE_BUFFERS block when porting to v1.12.13?

I don't understand this. Can you give an example on how to do this from an existing mesh without a shadow buffer and a usage of 5?
I checked the https://github.com/OGRECave/ogre/blob/4 ... #L144-L161 but I don't see anything I can use.

Instead of locking a HBU_GPU_ONLY buffer, you create a HBU_CPU_ONLY buffer and copy the contents of the former to the latter. Then you lock the HBU_CPU_ONLY buffer instead. The HBU_CPU_ONLY buffer acts as a temporary shadow buffer.

This is essentially what the linked code snipped does.

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

Instead of locking a HBU_GPU_ONLY buffer, you create a HBU_CPU_ONLY buffer and copy the contents of the former to the latter. Then you lock the HBU_CPU_ONLY buffer instead. The HBU_CPU_ONLY buffer acts as a temporary shadow buffer.

This is essentially what the linked code snipped does.

I still don't understand though.
Are you talking about actually making changes in the Ogre source for all users to go around this problem?
Because I don't see how it would be possible to read a HBU_STATIC_WRITE_ONLY/HBU_GPU_ONLY buffer and put it into a temporary HBU_CPU_ONLY buffer, since reading a usage of 5 was the entire problem that this thread was about from the start.

Think of it this way, as the original post question also wondered, how do you read a loaded mesh that is already HBU_STATIC_WRITE_ONLY/HBU_GPU_ONLY when loaded?

Can you give an example on how to do this from an existing mesh? Like from a loaded MeshPtr.
Because just reading the mesh vertex/index data from a mesh that is normally loaded (using HBU_STATIC_WRITE_ONLY/HBU_GPU_ONLY) with HBL_READ_ONLY was the entire problem of this thread, so how do you actually read it correctly?

chilly willy
Halfling
Posts: 65
Joined: Tue Jun 02, 2020 4:11 am
x 19

Re: Reading from HardwareBuffers

Post by chilly willy »

rpgplayerrobin, it looks like you can copy from the HBU_GPU_ONLY buffer to the HBU_CPU_ONLY buffer by calling HardwareBuffer::copyData() and that the RenderSystem overrides of that function are implemented without calling HardwareBuffer::lock()

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

rpgplayerrobin, it looks like you can copy from the HBU_GPU_ONLY buffer to the HBU_CPU_ONLY buffer by calling HardwareBuffer::copyData() and that the RenderSystem overrides of that function are implemented without calling HardwareBuffer::lock()

Ah, I see!
I always thought that it was impossible to read from the HBU_GPU_ONLY buffer, but apparently it must be possible to do it this way.
I will test it out tomorrow and if I find a better solution of my code with that I will update it here.

no, I was referring to the v1.12.13 tag I linked as upstream.
I guess you added the OGRE_D3D_MANAGE_BUFFERS block when porting to v1.12.13?

I check around a bit and yeah, it seems I did add this.
But changing that code will probably break a lot of stuff, see here for the performance thread regarding that change:
viewtopic.php?p=552890#p552890

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

the docs are correct, but sometimes it will still work. E.g. on D3D11 there is code to explicitly handle the on-demand copy to CPU:
https://github.com/OGRECave/ogre/blob/4 ... #L144-L161

D3D9 and GL would generally handle this inside the Driver. On Vulkan this is not implemented, as it would only work before you start rendering a frame. Therefore, if you aim for portable code, you should not rely on this, but rather add shadow buffers when you intend to access the buffers on the CPU like this:
https://ogrecave.github.io/ogre/api/lat ... d17ef60daa

As I understand it, Direct3D11 fixes this issue completely by automatically creating a shadow buffer when reading its content, which completely removes this issue for Direct3D11?

So for other render systems (like Direct3D9/Vulkan/OpenGL), isn't it possible in lockImpl in OgreD3D9HardwareBuffer.cpp (https://github.com/OGRECave/ogre/blob/m ... Buffer.cpp) to just do the same thing that Direct3D11 is doing?

That way there is no issue anymore, right? At least for the scenario where the usage is 5 (GPU_ONLY) and the lock option is HBL_READ_ONLY?
Or is there something that I am missing? Is it better to just let the driver handle this?

Last edited by rpgplayerrobin on Wed Jun 07, 2023 11:35 pm, edited 1 time in total.
rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

create a HBU_CPU_ONLY buffer of the same size as the buffer you want to read and copy to it. See the D3D11 code referenced above.

Instead of locking a HBU_GPU_ONLY buffer, you create a HBU_CPU_ONLY buffer and copy the contents of the former to the latter. Then you lock the HBU_CPU_ONLY buffer instead. The HBU_CPU_ONLY buffer acts as a temporary shadow buffer.

rpgplayerrobin, it looks like you can copy from the HBU_GPU_ONLY buffer to the HBU_CPU_ONLY buffer by calling HardwareBuffer::copyData() and that the RenderSystem overrides of that function are implemented without calling HardwareBuffer::lock()

I finally got around to try this now, with Direct3D9, with this code:

Code: Select all

std::unique_ptr<HardwareBuffer> mShadowBuffer;
mShadowBuffer.reset(new D3D9HardwareBuffer(D3DFMT_VERTEXDATA, ptr->getSizeInBytes(), HBU_CPU_ONLY, false));
mShadowBuffer->copyData(*ptr); // ptr = HardwareVertexBufferSharedPtr

First I had to add this for it to be able to work:
#include "RenderSystems/Direct3D9/OgreD3D9HardwareBuffer.h"
I also had to import "RenderSystem_Direct3D9.lib" into the application in order to get it to work.

But it does not solve the problem.

It comes into copyData in OgreHardwareBuffer.h and there it has no delegate so it proceeds to copy the data by locking it the normal way with HBL_READ_ONLY... So this way of doing it does not solve anything, at least not in Direct3D9.
It has exactly the same problem as trying to just lock it.
Even that function has a comment that says:
"Note that the source buffer must not be created with the usage HBU_WRITE_ONLY otherwise this will fail."

In Direct3D11, the copyData function is actually overridden, which means that it succeeds to copy the data since it instead goes into copyDataImpl and copies the data by using CopyResource, which I would guess always works, even for HBU_WRITE_ONLY buffers?

So now I think I understand why this is not made by default the same way in Direct3D9 as it is in Direct3D11, because it seems impossible to do currently?

It seems you cannot just add shadow buffers in Direct3D9 as well?
If you look at the latest code here:
https://github.com/OGRECave/ogre/blob/m ... Buffer.cpp
You can see that the shadow buffer is not actually used anywhere in the lock function? Does that mean that you need to alter the usage from 5 (GPU_ONLY) to something else like 1 (STATIC) when doing this for Direct3D9? (Which my version of the code already does apparently)

It comes back to my original question then:
Is there some way of "cloning" a mesh in Direct3D9 and temporarily just reading the vertex/index data instead of having to specifically change all meshes policies at startup to allow reading them?

chilly willy
Halfling
Posts: 65
Joined: Tue Jun 02, 2020 4:11 am
x 19

Re: Reading from HardwareBuffers

Post by chilly willy »

rpgplayerrobin wrote:

It comes into copyData in OgreHardwareBuffer.h and there it has no delegate so it proceeds to copy the data by locking it the normal way with HBL_READ_ONLY...

That's where I stopped being able to understand the code. I don't know what the delegate is or where/when/why it gets created...

rpgplayerrobin wrote:

So this way of doing it does not solve anything, at least not in Direct3D9.
It has exactly the same problem as trying to just lock it.

Perhaps it isn't actually a problem in Direct3D9. As paroj said, "but sometimes it will still work"

paroj wrote:

D3D9 and GL would generally handle this inside the Driver. On Vulkan this is not implemented, as it would only work before you start rendering a frame. Therefore, if you aim for portable code, you should not rely on this

So maybe with D3D9 you can lock a buffer that Ogre creates when Ogre's HBU_GPU_ONLY usage is specified, you just can't port that to other RenderSystems such as Vulkan. Maybe that's why the D3D9 RenderSystem doesn't need to override copyData(). Vulkan on the other hand does override copyData() (and avoids using lock unless it's CPU memory): https://github.com/OGRECave/ogre/blob/1 ... #L158-L176
But I still don't understand why Vulkan's lock() can't use copyData() to create a temp shadow buffer just like D3D11 does.

rpgplayerrobin wrote:

It seems you cannot just add shadow buffers in Direct3D9 as well?
If you look at the latest code here:
https://github.com/OGRECave/ogre/blob/m ... Buffer.cpp
You can see that the shadow buffer is not actually used anywhere in the lock function?

The base class HardwareBuffer's lock() checks for and uses shadow buffers before calling lockImpl(): https://github.com/OGRECave/ogre/blob/1 ... #L218-L231 so I think shadow buffers always work (unless of course you override the settings like you did with 1.12.13) I'm not positive though because lock() is virtual as well and I haven't checked all subclasses...

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Reading from HardwareBuffers

Post by paroj »

rpgplayerrobin wrote: Wed Jun 07, 2023 11:31 pm

First I had to add this for it to be able to work:
#include "RenderSystems/Direct3D9/OgreD3D9HardwareBuffer.h"
I also had to import "RenderSystem_Direct3D9.lib" into the application in order to get it to work.

this or use HardwareBufferManager::createVertexBuffer

It comes into copyData in OgreHardwareBuffer.h and there it has no delegate so it proceeds to copy the data by locking it the normal way with HBL_READ_ONLY... So this way of doing it does not solve anything, at least not in Direct3D9.

ah, it seems this is a D3D9 API defect as it lacks CopyResource like D3D11.
If the code works the way it is right now, I would just leave it alone. Its not somebody will change the D3D9 behavior to improve performance at this point..

That's where I stopped being able to understand the code. I don't know what the delegate is or where/when/why it gets created...

delegate allows the current buffer to act as a facade for another buffer. This is needed as Ogre traditionally distinguished between vertex/ index buffers (like D3D9), while modern APIs just have generic buffers. However, Ogre vertex buffers inherit from generic buffers, so to avoid code duplication all buffers got the delegate member. It was the most elegant & backwards compatible solution I could come up with.

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

So maybe with D3D9 you can lock a buffer that Ogre creates when Ogre's HBU_GPU_ONLY usage is specified

But this entire post talks exactly about that D3D9 cannot lock a GPU_ONLY buffer to read it.
I wish this was the case, since it would be so much easier than what this thread suggest has to be done for D3D9.
But all resources online are pretty much the same, that it is an undocumented behavior and that it can lead to invalid memory being read.

Some users are mentioning that if D3D9 has managed buffers (which I do not know if Ogre is using) it could be theoretically possible to read from them, but I am not sure where or how that would be done in Ogre3D so I am not really sure what to search for.
@paroj, help! :D

The base class HardwareBuffer's lock() checks for and uses shadow buffers before calling lockImpl()

The problem here is that the shadow buffer never actually gets copied before that happens, which means that the shadow buffer does not have the data from the source.
When debugging that function in D3D11 I can see that it gets called 2 times, first time with the real buffer which then copies itself into a shadow buffer, and then that shadow buffers gets locked with that function again.
In D3D9 I cannot see that happening since the first lock function in D3D9 never touches the shadow buffer in any way.
That is at least what I can see from the code (but cannot debug since my code alters D3D9 buffers to STATIC instead of adding a shadow buffer).

ah, it seems this is a D3D9 API defect as it lacks CopyResource like D3D11.
If the code works the way it is right now, I would just leave it alone. Its not somebody will change the D3D9 behavior to improve performance at this point..

I guess users still have to add shadow buffers for all meshes that should be able to be read for D3D9? (or even a usage of STATIC if I am right that the shadow buffer is ignored on D3D9)
Or do you mean to ignore that as well and just allow it to read GPU_ONLY buffers and let the driver handle it?

Also, I guess there is no way on D3D9 to copy a MeshPtr vertex/index buffer without my functions at viewtopic.php?p=554673#p554673? In that case those can be used.

Do we still have to add shadow buffers or alter their usage on Direct3D11? Or is that done automatically safely even for GPU_ONLY buffers without shadow buffers using CopyResource?

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Reading from HardwareBuffers

Post by paroj »

rpgplayerrobin wrote: Fri Jun 09, 2023 2:10 pm

But this entire post talks exactly about that D3D9 cannot lock a GPU_ONLY buffer to read it.

do you actually run into any issues when doing so? If so which? Or is this merely a theoretical discussion?

rpgplayerrobin wrote: Fri Jun 09, 2023 2:10 pm

Some users are mentioning that if D3D9 has managed buffers (which I do not know if Ogre is using) it could be theoretically possible to read from them, but I am not sure where or how that would be done in Ogre3D so I am not really sure what to search for.
@paroj, help! :D

this refers to D3DPOOL_MANAGED. In Ogre there is a related option called "Allow DirectX9Ex".
Also note that there is the "Auto hardware buffer management" option, which forces shadow buffers for all buffers on D3D9.

Feel free to check the D3D9 RS code for any issues regarding buffer usage with these two options.

rpgplayerrobin wrote: Fri Jun 09, 2023 2:10 pm

The base class HardwareBuffer's lock() checks for and uses shadow buffers before calling lockImpl()

The problem here is that the shadow buffer never actually gets copied before that happens, which means that the shadow buffer does not have the data from the source.

The shadow buffers are updated first and the real buffer is updated on unlock(). I am pretty sure this works as intended, otherwise context loss handling would be broken with D3D9.

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

do you actually run into any issues when doing so? If so which? Or is this merely a theoretical discussion?

The first answer you wrote in this thread was that if you aim for portable code, you should handle this yourself with manually setting shadow buffers/usage on the meshes you want to read from.

So if you skip that step as well, and it still works on our computers, it does not necessarily mean it will work on all computers, right?
Their driver/graphics card might handle it differently, and if it is, the application would probably just crash/not function correctly.

Also, there are many sources online which backs this up as well, some examples include:

  • https://learn.microsoft.com/en-us/windo ... 9/d3dusage

    D3DUSAGE_WRITEONLY
    Informs the system that the application writes only to the vertex buffer. Using this flag enables the driver to choose the best memory location for efficient write operations and rendering. Attempts to read from a vertex buffer that is created with this capability will fail. Buffers created with D3DPOOL_DEFAULT that do not specify D3DUSAGE_WRITEONLY may suffer a severe performance penalty. D3DUSAGE_WRITEONLY only affects the performance of D3DPOOL_DEFAULT buffers.

  • https://gamedev.net/forums/topic/335848 ... writeonly/

    When you specify the write only flag, you give d3d a permission to place the memory in a place where it may not be available to read from. This does not mean that it is required to make it write-only, though - but if you requested it, you can't assume that you can read from the buffer then.

WRITEONLY allows (not forces) the card to use video memory, rather than AGP memory, which is faster for the card to access. Some cards may return valid data when you attempt to read the locked data, but you cannot rely on it. Other cards, or even the same card with a different driver may return invalid data. Reading from a WRITEONLY buffer, while it may work, is an error... just not one that throws up a big assert box.

I just want to solve this in the right way for @chilly willy and for myself, but doing it correctly seems to be quite confusing.
Basically, here are all the alternatives:

  1. Ignore the issue completely and rely on that all drivers and graphics cards handle this correctly, even though the documentation says that they most likely will not.

  2. Use Direct3D11 instead and completely skip Direct3D9. Though I do not know if this works, since I have asked this question multiple times without an answer:

    As I understand it, Direct3D11 fixes this issue completely by automatically creating a shadow buffer when reading its content, which completely removes this issue for Direct3D11?

    Do we still have to add shadow buffers or alter their usage on Direct3D11? Or is that done automatically safely even for GPU_ONLY buffers without shadow buffers using CopyResource?

  3. Alter all meshes that should be able to be read in Direct3D9 (and also Direct3D11 if #2 is not correct) with setVertexBufferPolicy/setIndexBufferPolicy in CResourceGroupListener::resourceCreated.
    Then, for meshes that are only read sometimes but also needs an optimized way of rendering will have to be solved with a temporary copy of their mesh to read with:

    Code: Select all

    MeshPtr tmpClonedMesh = CMeshToMesh::ConvertToMesh(CGeneric::GetResourceFilePathFromFileName(meshPtr->getName()), "ReadMesh");

    This can also be used in combination with #2 but only for Direct3D9 if #2 is correct.

What way to solve it do you think is best @paroj?
And is my assumption in #2 actually correct or not?

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Reading from HardwareBuffers

Post by paroj »

rpgplayerrobin wrote: Sat Jun 10, 2023 6:23 pm

The first answer you wrote in this thread was that if you aim for portable code, you should handle this yourself with manually setting shadow buffers/usage on the meshes you want to read from.

Ah.. sorry.. I meant portable between different rendersystems not portable between different GPUs while using D3D9. You can assume that if it works on your machine it will also work everywhere else. D3D9 is now that old that the common behavior is a de-facto standard - even when re-implementing it on top of Vulkan (dxvk).

rpgplayerrobin wrote: Sat Jun 10, 2023 6:23 pm

As I understand it, Direct3D11 fixes this issue completely by automatically creating a shadow buffer when reading its content, which completely removes this issue for Direct3D11?
Do we still have to add shadow buffers or alter their usage on Direct3D11? Or is that done automatically safely even for GPU_ONLY buffers without shadow buffers using CopyResource?

sorry I overlooked the D3D11 aspect so far. Yes on D3D11 you can basically ignore the issue as it is handled internally by Ogre.

However, the drawback of not using shadow buffers is that you need to wait for the GPU>CPU transfer on each access. So if you do the raycasting each frame, it is better to cache the buffer data on the CPU (essentially enabling the shadow buffers).

If you only need the data once, the portable way (as in works on Vulkan too) is what I wrote here:
viewtopic.php?p=554677#p554677

The issue with Vulkan is that you cannot do GPU<>CPU transfers once a renderpass (a concept absent from earler APIs) has started.
One could actually do the same as on D3D11 and just generate a meaningful error message when a renderpass is active.

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

Ok nice!

Then the best solution is number #3 in the last post, but for both Direct3D9 and Direct3D11 at the same time.
So the code posted in this thread is still the best solution for this in that case, if you prioritize speed when reading.

The only thing that can be altered is instead of using my ConvertToMesh, you could on Direct3D11 (and not Direct3D9) instead use HardwareBuffer::copyData()/HardwareBufferManager::createVertexBuffer/createIndexBuffer to copy mesh data in a more efficient manner instead of having to load in the entire mesh using my ConvertToMesh.

chilly willy
Halfling
Posts: 65
Joined: Tue Jun 02, 2020 4:11 am
x 19

Re: Reading from HardwareBuffers

Post by chilly willy »

Thank paroj and rpgplayerrobin for helping clear this up.

Now I'm thinking how this relates to software animation (eg if we call addSoftwareAnimationRequest()). It looks like the buffers that hold the result of software animation are created with HBU_CPU_TO_GPU and shadow buffers, so we can always lock them for reading and it won't read from the GPU.

However it looks like the software animation functions Mesh::softwareVertexBlend() and Entity::finalisePoseNormals() do call lock on our original mesh VertexBuffers so we would have to use shadow buffers (or a different usage) to guarantee that works correctly. Is that correct?

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Reading from HardwareBuffers

Post by paroj »

no, softwareVertexBlend uses mSoftwareVertexAnimVertexData which copies the data to CPU_TO_GPU buffers.

rpgplayerrobin
Gnoll
Posts: 619
Joined: Wed Mar 18, 2009 3:03 am
x 353

Re: Reading from HardwareBuffers

Post by rpgplayerrobin »

Code: Select all

// Blend, taking source from either mesh data or morph data
Mesh::softwareVertexBlend(
(mMesh->getSharedVertexDataAnimationType() != VAT_NONE) ?
mSoftwareVertexAnimVertexData.get() : mMesh->sharedVertexData,
mSkelAnimVertexData.get(),
blendMatrices, mMesh->sharedBlendIndexToBoneIndexMap.size(),
blendNormals);

I debugged the code above and getSharedVertexDataAnimationType returns VAT_NONE, which in turn returns mMesh->sharedVertexData and not mSoftwareVertexAnimVertexData.
mSoftwareVertexAnimVertexData is even empty at this point.

So inside softwareVertexBlend, sourceVertexData is actually mMesh->sharedVertexData, and that data gets read with read only:

Code: Select all

HardwareBufferLockGuard srcPosLock(srcPosBuf, HardwareBuffer::HBL_READ_ONLY);

I also confirmed that srcPosBuf has an mUsage of 5, which is GPU_ONLY, which means that this is an incorrect code?

So it seems @chilly willy is correct.

paroj
OGRE Team Member
OGRE Team Member
Posts: 1994
Joined: Sun Mar 30, 2014 2:51 pm
x 1074
Contact:

Re: Reading from HardwareBuffers

Post by paroj »

ah.. yes.. software skeletal animation..

I also confirmed that srcPosBuf has an mUsage of 5, which is GPU_ONLY, which means that this is an incorrect code?

no, it is not incorrect but rather slow. If the driver does not intervene, we download the data from GPU each frame without ever using it on the GPU.

Post Reply