setNumThreadGroupsBasedOn function for compute shaders

Discussion area about developing with Ogre2 branches (2.1, 2.2 and beyond)
Post Reply
User avatar
SolarPortal
OGRE Contributor
OGRE Contributor
Posts: 134
Joined: Sat Jul 16, 2011 8:29 pm
Location: UK
x 6
Contact:

setNumThreadGroupsBasedOn function for compute shaders

Post by SolarPortal » Sat Nov 09, 2019 5:51 pm

Hey, me again with more tweaks that i feel were missing from the source.
This time its related to the setNumThreadGroupsBasedOn() function which makes it easy to set the num thread groups for the job based on a texture or uav size. However, when i was working with mipmaps it was not functioning as expected as it did not take the mip level into account and also the number of thread groups that came out at the end was always "1,1,(depth)".
So i made a few tweaks that make it much easier to work with uavs and compute passes that need to edit each mip level without having to change the thread groups per mip resolution.

so ill get straight to the tweaks. All the changes are based in "OgreHlmsComputeJob.cpp" on function "_calculateNumThreadGroupsBasedOnSetting()"

after this line,

Code: Select all

const TexturePtr &tex = texSlots[mThreadGroupsBasedOnTexSlot].texture;
i added:

Code: Select all

const int32 mipLevel = texSlots[mThreadGroupsBasedOnTexSlot].mipmapLevel;
i then altered the resolution section to accomadate the changes:

Code: Select all

      
       uint32 resolution[3];
       resolution[0] = tex->getWidth() >> mipLevel;
       resolution[1] = tex->getHeight() >> mipLevel;
       resolution[2] = tex->getDepth();// >> mipLevel;
And finally to make sure that the correct number of threads are created based on the divisor factor i changed,

Code: Select all

                    resolution[i] = (resolution[i] + mThreadGroupsBasedDivisor[i] - 1u) /
                                    mThreadGroupsBasedDivisor[i];

                    uint32 numThreadGroups = (resolution[i] + mThreadsPerGroup[i] - 1u) /
                                             mThreadsPerGroup[i];

to this: (as what was there simply did not make any sense and i believe it was a bug)

Code: Select all

uint32 numThreadGroups = (resolution[i] + mThreadGroupsBasedDivisor[i] - 1u) / mThreadGroupsBasedDivisor[i];
finally here is my version of the function which is working nicely on binding a uav's different mipmaps:

Code: Select all

    void HlmsComputeJob::_calculateNumThreadGroupsBasedOnSetting()
    {
        bool hasChanged = false;

        if( mThreadGroupsBasedOnTexture != ThreadGroupsBasedOnNothing )
        {
            const TextureSlotVec &texSlots = mThreadGroupsBasedOnTexture == ThreadGroupsBasedOnTexture ?
                        mTextureSlots : mUavSlots;

            if( mThreadGroupsBasedOnTexSlot < texSlots.size() &&
                !texSlots[mThreadGroupsBasedOnTexSlot].texture.isNull() )
            {
                const TexturePtr &tex = texSlots[mThreadGroupsBasedOnTexSlot].texture;
				const int32 mipLevel = texSlots[mThreadGroupsBasedOnTexSlot].mipmapLevel;

                uint32 resolution[3];
                resolution[0] = tex->getWidth() >> mipLevel;
                resolution[1] = tex->getHeight() >> mipLevel;
		resolution[2] = tex->getDepth();// >> mipLevel;

                if( tex->getTextureType() == TEX_TYPE_CUBE_MAP )
                    resolution[2] = tex->getNumFaces();

                for( int i=0; i<3; ++i )
                {
                    uint32 numThreadGroups = (resolution[i] + mThreadGroupsBasedDivisor[i] - 1u) / mThreadGroupsBasedDivisor[i];

                    if( mNumThreadGroups[i] != numThreadGroups )
                    {
                        mNumThreadGroups[i] = numThreadGroups;
                        hasChanged = true;
                    }
                }
            }
            else
            {
                LogManager::getSingleton().logMessage(
                            "WARNING: No texture/uav bound to compute job '" + mName.getFriendlyText() +
                            "' at slot " + StringConverter::toString(mThreadGroupsBasedOnTexSlot) +
                            " while calculating number of thread groups based on texture");
            }
        }

        if( hasChanged )
            mPsoCacheHash = -1;
    }
Hope this makes sense and helps someone. :D
Now all i need to figure out is how to bind a constant buffer and change the value of a property per mip map. :P
0 x
Lead developer of the Skyline Game Engine: https://aurasoft-skyline.co.uk

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4138
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 268
Contact:

Re: setNumThreadGroupsBasedOn function for compute shaders

Post by dark_sylinc » Sat Nov 09, 2019 6:31 pm

Sounds like you would greatly benefit from Ogre 2.2

The issues you're having all are texture related, which were fixed in 2.2 8)
1 x

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4138
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 268
Contact:

Re: setNumThreadGroupsBasedOn function for compute shaders

Post by dark_sylinc » Sat Nov 09, 2019 6:33 pm

SolarPortal wrote:
Sat Nov 09, 2019 5:51 pm
Now all i need to figure out is how to bind a constant buffer and change the value of a property per mip map. :P
I want to make this easier when using the compositor. But for the time being, solutions like that one have to either rely on a custom compositor pass (like the mipmapping filter does) or manually using compute (which is much more flexible may be a bit hard if you're unaware of how barriers work and when are they needed)
0 x

User avatar
SolarPortal
OGRE Contributor
OGRE Contributor
Posts: 134
Joined: Sat Jul 16, 2011 8:29 pm
Location: UK
x 6
Contact:

Re: setNumThreadGroupsBasedOn function for compute shaders

Post by SolarPortal » Sat Nov 09, 2019 9:51 pm

I see :) We have been thinking of moving to 2.2 but with the work involved to change the texturing system over to match, it is just to much work for the time being. But definitely will be moving to 2.2 at some point in the near future.

I think i have a solution using the compositor pass listeners and accessing the passEarlyPreExecute and changing the single shared constant buffer data per pass which means the compute job only has the one buffer to manage but each pass could access and change the data to match. Unless there is something wrong with doing it that way.. only recently got into compute shaders lol :P

Thanks for responding though :)
0 x
Lead developer of the Skyline Game Engine: https://aurasoft-skyline.co.uk

Post Reply