Flush GPU command buffer to stop input lag.

A place for users of OGRE to discuss ideas and experiences of utilitising OGRE in their games / demos / applications.
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Flush GPU command buffer to stop input lag.

Post by mkultra333 »

Hope this doesn't count as cross posting... I have a thread about this in Help but the actual problem and "solution" ended up far removed from what my initial concerns were.

If you want, you can read how this got started here, http://www.ogre3d.org/forums/viewtopic.php?f=2&t=50360.

Story short: on an Athlon dual core runnings XP SP2 and with an nvidia 7950GT, I was experiencing severely laggy input from the mouse and keyboard. This was noticable at 50-60 FPS and terrible at 25-30 FPS. Since I'm working on a first person shooter, any control lag is unacceptable.

I eliminated various possible causes and pin-pointed it as some kind of GPU issue. As best I can tell, if the game loop calls renderOneFrame when the GPU command buffer is full there is a sudden huge drop in control responsiveness, even though the FPS gives no hint. Any time-wasting CPU delay (like Sleep(n) or a pointless while loop) that allows the GPU to catch up fixed the lag, but this was impossible to implement as a solution because it's impossible to always predict the correct amount of time-wasting needed.

I have found a fix to this problem, although only for DirectX so far. To get rid of the lag, it is necessary to flush the GPU command buffer. Since I don't know of any way to do this from inside Ogre normally, it requires modifying and recompiling the RenderSystem_Direct3D9.dll.

I doubt this solution is optimal. My understanding is that flushing the GPU command buffer is kind of extreme and hurts performance. I'm no expert on D3D either so perhaps there are other issues with this solution. Hopefully those more knowledgeable can offer some guidance here.

Anyhow, heres the fix.

In OgreD3D9RenderWindow.cpp, find SwapBuffers. Make the following change, which adds an extra function call and an extra function. Of course, add the new emptyGPUCommandBuffer() function to the headers as well.

Code: Select all

	void D3D9RenderWindow::swapBuffers( bool waitForVSync )
	{
		// access device through driver
		LPDIRECT3DDEVICE9 mpD3DDevice = mDriver->getD3DDevice();
		if( mpD3DDevice )
		{
			HRESULT hr;
			if (mIsSwapChain)
			{
				hr = mpSwapChain->Present(NULL, NULL, NULL, NULL, 0);
			}
			else
			{
				hr = mpD3DDevice->Present( NULL, NULL, 0, NULL );
			}
			if( D3DERR_DEVICELOST == hr )
			{
				SAFE_RELEASE(mpRenderSurface);

				static_cast<D3D9RenderSystem*>(
					Root::getSingleton().getRenderSystem())->_notifyDeviceLost();
			}
			else if( FAILED(hr) )
				OGRE_EXCEPT(Exception::ERR_RENDERINGAPI_ERROR, "Error Presenting surfaces", "D3D9RenderWindow::swapBuffers" );
		
		
			emptyGPUCommandBuffer() ;
		
		}
	}

	// mkultra333, force GPU command buffer to empty.
	void D3D9RenderWindow::emptyGPUCommandBuffer()
	{
		LPDIRECT3DDEVICE9 mpD3DDevice = mDriver->getD3DDevice();

		if(mpD3DDevice)
		{
			IDirect3DQuery9* pEventQuery=NULL ;
			mpD3DDevice->CreateQuery(D3DQUERYTYPE_EVENT, &pEventQuery) ;

			if(pEventQuery!=NULL)
			{
				pEventQuery->Issue(D3DISSUE_END) ;
				while(S_FALSE == pEventQuery->GetData(NULL, 0, D3DGETDATA_FLUSH)) ;
			}
		}

	}
Any feedback on whether this solution is wrong or has a better fix is most welcome, as is implementing the fix for the OpenGL renderer.

And note, even if you've never got this bug yourself, it's possible that other people who use your app might. Something to keep in mind if you're writing a game you expect others with unknown hardware to play.

Edit: Added "if(pEventQuery!=NULL)" safety check.
"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by sinbad »

This is basically a problem of the CPU outpacing the GPU and filling up the command buffer. The easiest way to resolve it is to make the CPU do more work :)

Whilst your approach will cause the GPU buffers to flush, it will do it every frame with the same query, which means you'd be losing the benefit of multi-frame queueing. This will kill the benefit of multi-GPU alternate frame rendering, for example, and also will prevent the GPU doing useful work during other downtime periods, so is not desirable in a general sense.

Another way, that should also address the issue on GL too, is to use HardwareOcclusionQuery. The reason these are useful is because a) they work on all render systems, and b) forcing them to give your their results (ie blocking the CPU until the GPU is done) can be specifically targetted at a given frame. Therefore, using this you can implement a strategy which only waits for a previous frame to be processed, rather than the one you just issued to the GPU. This means you can still have a fixed number of frames in the command buffer queue (thus condusive to multi-GPU systems and performance levelling), but no more than that, putting a cap on the number of frames will mean you can control the amount of data in the command buffer at once. Thus the CPU will wait until the GPU is no more than X frames behind, which should give you enough consistent breathing space in the command buffer.

So, let's say you take the simplest approach (that is still multi-GPU friendly) and want to want no more than 2 frames to be in the command buffer at once. You would need 2 HardwareOcclusionQueries, each of which is targetted at the contents of one of those frames. You would flip-flop between them in each frame, defining the beginning/end of the data you're interested in, and then waiting for the one issued in the previous frame to complete (which means, waiting for the GPU to finish the previous frame, but not the current one). Something like this:

Code: Select all

// CPU work

// flip/flop
std::swap(query1, query2);

// define boundaries of the query for this frame
query1->beginOcclusionQuery();
// render
query1->endOcclusionQuery();

// pull the query for the *previous* frame, this will flush up to the end of the previous frame, but no further
// this leaves some parallelism for the GPU but keeps the command buffer at a fixed maximum size
unsigned int dummy;
query2->pullOcclusionQuery(&dummy);
If you wanted to support more GPUs in AFR mode, you'd want to increase the number of occlusion queries and pull them such that you keep X frames in the queue at once, where X is the number of GPUs doing AFR.

Have a go with this and let me know. You will probably need to put the HOQs in a RenderQueueListener so that they occure between the begin/end scene.
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Re: Flush GPU command buffer to stop input lag.

Post by mkultra333 »

Neato! Thanks Sinbad.

My gappy C++ knowledge plus my very thin Ogre knowledge meant I was wandering in the woods for a few hours trying to implement the above. Until now I've barely touched listeners, and RenderQueues and HOQs are unknown territory. However I eventually managed to piece together the following.

I'm using the basic ogre framework, http://www.ogre3d.org/wiki/index.php/Ba ... _Framework. I made the following changes to OgreFramework.hpp.

Add these two headers:

Code: Select all

#include <OgreRenderQueueListener.h>
#include <OgreHardwareOcclusionQuery.h>
Add these two classes above class::OgreFramework

Code: Select all

class HOQ: public Ogre::HardwareOcclusionQuery
{
	public:
		int HardwareOcclusionQuery() {}
		void beginOcclusionQuery() {}
		void endOcclusionQuery() {}
		bool pullOcclusionQuery (unsigned int *NumOfFragments) 
		{
			while(isStillOutstanding()) ;
			return false ; 
		} 
		bool isStillOutstanding (void) { return mIsQueryResultStillOutstanding ; }



};

class rQueueListener: public Ogre::RenderQueueListener
{
public:
	
	rQueueListener() {}

	~rQueueListener() {}

	HOQ* Query1 ;
	HOQ* Query2 ;

	void renderQueueStarted(Ogre::uint8 queueGroupId, const Ogre::String &invocation, bool &skipThisInvocation) {}
	void renderQueueEnded(Ogre::uint8 queueGroupId, const Ogre::String &invocation, bool &skipThisInvocation) {}
};
In OgreFramework.cpp I changed the runDemo() function to the following:

Code: Select all

void DemoApp::runDemo()
{
	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Start main loop...");

	
	UINT uFrameStartTime=OgreFramework::getSingletonPtr()->m_pTimer->getMilliseconds();
	UINT uFrameTotalTime=0 ;

	OgreFramework::getSingletonPtr()->m_pRenderWnd->resetStatistics() ;
	

	/////////////////////////////////////////////////////////////////////////////////////////////
	// use a renderQueueListener with two HardwareOcclusionQueries to prevent the CPU from 
	// getting too far ahead of the GPU and causing input lag from keyboard and mouse.
	// thanks to Sinbad for this suggestion and code outline.
	// We aren't actually doing Hardware Occlusion Culling, just exploiting the way we can
	// make it flush the GPU buffer for *previous* frames.

	// add our renderQueueListener used to prevent the GPU command buffer filling and causing input lag
	rQueueListener* rqListener = new rQueueListener ;
	unsigned int dummy=0 ;
	bool dummyBoolA=false ;
	bool dummyBoolB=false ;
	rqListener->renderQueueStarted(0, "", dummyBoolA) ;
	rqListener->renderQueueEnded(100, "", dummyBoolB) ;
	OgreFramework::getSingletonPtr()->m_pSceneMgr->addRenderQueueListener(rqListener) ;

	// create our queries
	rqListener->Query1 = (HOQ*)OgreFramework::getSingletonPtr()->m_pRoot->getRenderSystem()->createHardwareOcclusionQuery() ;
	rqListener->Query2 = (HOQ*)OgreFramework::getSingletonPtr()->m_pRoot->getRenderSystem()->createHardwareOcclusionQuery() ;

	//
	/////////////////////////////////////////////////////////////////////////////////////////////


	while(!m_bShutdown && !OgreFramework::getSingletonPtr()->isOgreToBeShutDown()) 
	{
		if(OgreFramework::getSingletonPtr()->m_pRenderWnd->isClosed())m_bShutdown = true;

#if OGRE_PLATFORM == OGRE_PLATFORM_WIN32
			Ogre::WindowEventUtilities::messagePump() ;
#endif	



		if(OgreFramework::getSingletonPtr()->m_pRenderWnd->isActive())
		{

			// get start time of frame
			uFrameStartTime=OgreFramework::getSingletonPtr()->m_pTimer->getMicroseconds() ;
				
			// update input and physics
			OgreFramework::getSingletonPtr()->m_pKeyboard->capture();
			OgreFramework::getSingletonPtr()->m_pMouse->capture();
			OgreFramework::getSingletonPtr()->updateOgre(uFrameTotalTime/1000.0f);

			// swap queries
			std::swap(rqListener->Query1, rqListener->Query2) ;

			// define query beginning for this frame
			rqListener->Query1->beginOcclusionQuery() ;

			// render the frame
			OgreFramework::getSingletonPtr()->m_pRoot->renderOneFrame();

			// define query end for this frame
			rqListener->Query1->endOcclusionQuery() ;

			// pull query for previous frame.  Flushes GPU command buffer up to the end of the previous frame but no further.
			rqListener->Query2->pullOcclusionQuery(&dummy) ;

			// calculate frame time.
			uFrameTotalTime=OgreFramework::getSingletonPtr()->m_pTimer->getMicroseconds()-uFrameStartTime ;

		}
		else
		{
			Sleep(1000);
		}
	}
	
	// clean up our HOQ queries and renderQueueListener
	OgreFramework::getSingletonPtr()->m_pRoot->getRenderSystem()->destroyHardwareOcclusionQuery((Ogre::HardwareOcclusionQuery*) rqListener->Query1) ;
	OgreFramework::getSingletonPtr()->m_pRoot->getRenderSystem()->destroyHardwareOcclusionQuery((Ogre::HardwareOcclusionQuery*) rqListener->Query2) ;
OgreFramework::getSingletonPtr()->m_pSceneMgr->removeRenderQueueListener(rqListener) ;
	delete rqListener ;


	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Main loop quit");
	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Shutdown OGRE...");
}
This seems to work, and the FPS is better than my first method. Does it look correct?
Edit: Added removeRenderQueueListener.
"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by Praetor »

Seems almost there. The purpose of the listener is that Ogre will automatically call your function before and after a render queue is invoked. That means you don't need to call renderQueueStarted and renderQueueEnded manually. Though, you are calling them before and after the main renderOneFrame call. I'm not an experty with HOQ but I think that would work fine as well, in which case you no longer would need a RenderQueueListener and certainly not to register it with the scene manager.
Game Development, Engine Development, Porting
http://www.darkwindmedia.com
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Re: Flush GPU command buffer to stop input lag.

Post by mkultra333 »

Thanks Praetor. I tested it without renderQueueStarted and renderQueueEnded, and it still worked fine, so I've removed them. People who use the above code should probably do the same.

I've made a better version, I added the possibility of having from 1 to 4 queries. At low FPS, 2 queries still leads to a bit of lag, I think it's worth losing a few FPS flushing the GPU buffer every frame to get really edgy responsiveness so I've allowed for just 1 query if the user wants. However, if the user has a killer system they can have up to 4 queries. (Actually more, since the max is controlled by a #define.)

Heres the new code. Add this just above the OgreFramework class.

Code: Select all

#define MAXGPUQUERY 4

class HOQ: public Ogre::HardwareOcclusionQuery
{
	public:
		int HardwareOcclusionQuery() {}
		void beginOcclusionQuery() {}
		void endOcclusionQuery() {}
		bool pullOcclusionQuery (unsigned int *NumOfFragments) 
		{
			while(mIsQueryResultStillOutstanding) ;
			return false ; 
		} 
		bool isStillOutstanding (void) { return mIsQueryResultStillOutstanding ; }



};


class rQueueListener: public Ogre::RenderQueueListener
{
public:
	
	rQueueListener() {}
	~rQueueListener() {}

	HOQ* Query[MAXGPUQUERY] ;

	void renderQueueStarted(Ogre::uint8 queueGroupId, const Ogre::String &invocation, bool &skipThisInvocation) {}
	void renderQueueEnded(Ogre::uint8 queueGroupId, const Ogre::String &invocation, bool &skipThisInvocation) {}
};
Add this variable to the OgreFramework class.

Code: Select all

	// Add this to OgreFramework and initialize it from 1 to MAXGPUQUERY
	// If you're good you'll make it private and add some get/set functions, 
	// I just made it public for now.

	// A value of 1 means we flush every frame, so no GPU command buffering.
	// This is good for low FPS because even just 1 buffer gives noticable
	// input lag.  However users with high FPS can afford a few buffers.
	
	int m_MaxGPUQuery ;
And here's the new runDemo() function.

Code: Select all

void DemoApp::runDemo()
{
	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Start main loop...");

	
	UINT uFrameStartTime=OgreFramework::getSingletonPtr()->m_pTimer->getMilliseconds();
	UINT uFrameTotalTime=0 ;

	OgreFramework::getSingletonPtr()->m_pRenderWnd->resetStatistics() ;
	

	/////////////////////////////////////////////////////////////////////////////////////////////
	// use a renderQueueListener with 1 to 4 HardwareOcclusionQueries to prevent the CPU from 
	// getting too far ahead of the GPU and causing input lag from keyboard and mouse.
	// thanks to Sinbad for this suggestion and code outline.
	// We aren't actually doing Hardware Occlusion Culling, just exploiting the way we can
	// make it flush the GPU buffer for prior frames.

	rQueueListener* rqListener = new rQueueListener ;
	unsigned int dummy=0 ;
	OgreFramework::getSingletonPtr()->m_pSceneMgr->addRenderQueueListener(rqListener) ;

	// get the maximum gpu queries to be used.  
	int nMaxGPUQuery=OgreFramework::getSingletonPtr()->m_MaxGPUQuery ;

	// make sure it is in range.
	if(nMaxGPUQuery<1) 
		nMaxGPUQuery=1 ;
	else
		if(nMaxGPUQuery>MAXGPUQUERY) 
		nMaxGPUQuery=MAXGPUQUERY ;

	int nNewQuery=0 ;
	int nOldQuery=0 ;

	// create our queries
	HOQ** pHOQ=rqListener->Query ;
	for(nNewQuery=0 ; nNewQuery<nMaxGPUQuery ; nNewQuery++)
		pHOQ[nNewQuery] = (HOQ*)OgreFramework::getSingletonPtr()->m_pRoot->getRenderSystem()->createHardwareOcclusionQuery() ;

	nNewQuery=nOldQuery-1 ;
	if(nNewQuery<0)
		nNewQuery+=nMaxGPUQuery ;

	//
	/////////////////////////////////////////////////////////////////////////////////////////////


	while(!m_bShutdown && !OgreFramework::getSingletonPtr()->isOgreToBeShutDown()) 
	{
		if(OgreFramework::getSingletonPtr()->m_pRenderWnd->isClosed())m_bShutdown = true;

#if OGRE_PLATFORM == OGRE_PLATFORM_WIN32
			Ogre::WindowEventUtilities::messagePump() ;
#endif	



		if(OgreFramework::getSingletonPtr()->m_pRenderWnd->isActive())
		{

			// get start time of frame
			uFrameStartTime=OgreFramework::getSingletonPtr()->m_pTimer->getMicroseconds() ;
				
			// update input and physics
			OgreFramework::getSingletonPtr()->m_pKeyboard->capture();
			OgreFramework::getSingletonPtr()->m_pMouse->capture();
			OgreFramework::getSingletonPtr()->updateOgre(uFrameTotalTime/1000.0f);

			// increment the buffer.  
			nNewQuery=nOldQuery ;
			nOldQuery++ ;
			if(nOldQuery==nMaxGPUQuery)
				nOldQuery=0 ;


			// define query beginning for this frame
			pHOQ[ nNewQuery ]->beginOcclusionQuery() ;

			// render the frame
			OgreFramework::getSingletonPtr()->m_pRoot->renderOneFrame();

			// define query end for this frame
			pHOQ[ nNewQuery ]->endOcclusionQuery() ;

			// pull query for previous frame.  Flushes GPU command buffer up to the end of the previous frame but no further.
			pHOQ[ nOldQuery ]->pullOcclusionQuery(&dummy) ;

			// calculate frame time.
			uFrameTotalTime=OgreFramework::getSingletonPtr()->m_pTimer->getMicroseconds()-uFrameStartTime ;

		}
		else
		{
			Sleep(1000);
		}
	}
	
	// clean up our HOQ queries and renderQueueListener
	for(nNewQuery=0 ; nNewQuery<nMaxGPUQuery ; nNewQuery++)
		OgreFramework::getSingletonPtr()->m_pRoot->getRenderSystem()->destroyHardwareOcclusionQuery( (Ogre::HardwareOcclusionQuery*)pHOQ[nNewQuery] ) ;
	
	OgreFramework::getSingletonPtr()->m_pSceneMgr->removeRenderQueueListener(rqListener) ;
	delete rqListener ;


	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Main loop quit");
	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Shutdown OGRE...");
}
Edit: More consistent variable naming.
Edit: Changed cast in destroyHardwareOcclusionQuery from (HOQ*) to (Ogre::HardwareOcclusionQuery*)
"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by sinbad »

Sorry, catching up with this thread now.

There's a problem with what you're doing there - you can't actually subclass HardwareOcclusionQuery, because the RenderSystem already provides a subclass specialised to work with DirectX or GL, and you certainly can't cast the result from createHardwareOcclusionQuery to your own class (because it's not of that type)! I'm amazed that a) that code is working at all, and b) it's not exploding in some horribly nasty way! :)

If you want to make the process easier to use, then you should use composition instead of inheritance - so maybe add a HOQUtility class which takes pointers to the real HardwareOcclusionQuery pointers that the RenderSystem gives you (which are actually subclasses depending on the render system in use), and then makes them easier to use.

Despite my amazement that your subclassing approach works at all, I'm glad the overall technique worked. I'm not surprised the frame rate is better, it's because it's allowing up to one frame of parallel CPU / GPU work rather than making them work in lockstep.
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Re: Flush GPU command buffer to stop input lag.

Post by mkultra333 »

sinbad wrote:Sorry, catching up with this thread now.

There's a problem with what you're doing there - you can't actually subclass HardwareOcclusionQuery, because the RenderSystem already provides a subclass specialised to work with DirectX or GL, and you certainly can't cast the result from createHardwareOcclusionQuery to your own class (because it's not of that type)! I'm amazed that a) that code is working at all, and b) it's not exploding in some horribly nasty way! :)
Oops. I guess the cog teeth just happened to mesh, neither the compiler or the graphics card seemed to mind.
If you want to make the process easier to use, then you should use composition instead of inheritance - so maybe add a HOQUtility class which takes pointers to the real HardwareOcclusionQuery pointers that the RenderSystem gives you (which are actually subclasses depending on the render system in use), and then makes them easier to use.
I just got rid of my subclass and used Ogre::HardwareOcclusionQuery pointers directly. Not a lot is being done really... a HOQUtility class seemed unnecessary clutter.
Despite my amazement that your subclassing approach works at all, I'm glad the overall technique worked. I'm not surprised the frame rate is better, it's because it's allowing up to one frame of parallel CPU / GPU work rather than making them work in lockstep.
Different situations give different results. At low fps in normal modes flushing after every frame still works best. But at higher FPS you can use more command buffers and get even better FPS. I also notice that even at low FPS, stereoscopic works better with the command buffer enabled, probably because rendering one frame is really two renders from different perspectives.

Apart from getting rid of the subclass, I've also added the option to have no queries at all by setting the number of queries to 0. This just skips all the messing about with the GPU command buffer entirely, in case it causes problems on some configuration or other.

Here's the new code.

Include these headers.

Code: Select all

#include <OgreRenderQueueListener.h>
#include <OgreHardwareOcclusionQuery.h>
Add the following listener class. (Don't add the HOQ class anymore.)

Code: Select all

#define MAXGPUQUERY 4

class rQueueListener: public Ogre::RenderQueueListener
{
public:
	
	rQueueListener() {}
	~rQueueListener() {}

	Ogre::HardwareOcclusionQuery* Query[MAXGPUQUERY] ;

	void renderQueueStarted(Ogre::uint8 queueGroupId, const Ogre::String &invocation, bool &skipThisInvocation) {}
	void renderQueueEnded(Ogre::uint8 queueGroupId, const Ogre::String &invocation, bool &skipThisInvocation) {}
};
Add this variable to the OgreFramework class, and initialize it as desired.

Code: Select all

	// Add this to OgreFramework and initialize it from 0 to MAXGPUQUERY
	// If you're good you'll make it private and add some get/set functions, 
	// I just made it public for now.

	// A value of 0 means don't mess with the GPU buffers at all.  There
	// might be some systems where the queries cause problems, so let the user 
	// deactivate the queries completely if desired.

	// A value of 1 means we flush every frame, so no GPU command buffering.
	// This is good for low FPS because even just 1 buffer gives noticable
	// input lag.  However users with high FPS can afford a few buffers.
	
	int m_MaxGPUQuery ;
Change the DemoApp main loop to the following.

Code: Select all

void DemoApp::runDemo()
{
	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Start main loop...");

	
	UINT uFrameStartTime=OgreFramework::getSingletonPtr()->m_pTimer->getMilliseconds();
	UINT uFrameTotalTime=0 ;

	OgreFramework::getSingletonPtr()->m_pRenderWnd->resetStatistics() ;
	

	/////////////////////////////////////////////////////////////////////////////////////////////
	// use a renderQueueListener with 1 to 4 HardwareOcclusionQueries to prevent the CPU from 
	// getting too far ahead of the GPU and causing input lag from keyboard and mouse.
	// thanks to Sinbad for this suggestion and code outline.
	// We aren't actually doing Hardware Occlusion Culling, just exploiting the way we can
	// make it flush the GPU buffer for prior frames.
	// Messing with the GPU command buffer can be turned off completely by seting m_MaxGPUQuery to 0 

	// get the maximum gpu queries to be used.  
	int nMaxGPUQuery=OgreFramework::getSingletonPtr()->m_MaxGPUQuery ;
	unsigned int dummy=0 ;
	int nNewQuery=0 ;
	int nOldQuery=0 ;
	rQueueListener* rqListener=NULL ;
	Ogre::HardwareOcclusionQuery** pHOQ=NULL ;


	if(nMaxGPUQuery!=0) // if querying is turned on
	{
		// make sure it is in range.
		if(nMaxGPUQuery<1) 
			nMaxGPUQuery=1 ;
		else
			if(nMaxGPUQuery>MAXGPUQUERY) 
			nMaxGPUQuery=MAXGPUQUERY ;


		rqListener = new rQueueListener ;
		
		OgreFramework::getSingletonPtr()->m_pSceneMgr->addRenderQueueListener(rqListener) ;

		// create our queries
		pHOQ=rqListener->Query ;
		for(nNewQuery=0 ; nNewQuery<nMaxGPUQuery ; nNewQuery++)
			pHOQ[nNewQuery] = OgreFramework::getSingletonPtr()->m_pRoot->getRenderSystem()->createHardwareOcclusionQuery() ;

		nNewQuery=nOldQuery-1 ;
		if(nNewQuery<0)
			nNewQuery+=nMaxGPUQuery ;
	}
	//
	/////////////////////////////////////////////////////////////////////////////////////////////


	while(!m_bShutdown && !OgreFramework::getSingletonPtr()->isOgreToBeShutDown()) 
	{
		if(OgreFramework::getSingletonPtr()->m_pRenderWnd->isClosed())m_bShutdown = true;

#if OGRE_PLATFORM == OGRE_PLATFORM_WIN32
			Ogre::WindowEventUtilities::messagePump() ;
#endif	



		if(OgreFramework::getSingletonPtr()->m_pRenderWnd->isActive())
		{

			// get start time of frame
			uFrameStartTime=OgreFramework::getSingletonPtr()->m_pTimer->getMicroseconds() ;
				
			// update input and physics
			OgreFramework::getSingletonPtr()->m_pKeyboard->capture();
			OgreFramework::getSingletonPtr()->m_pMouse->capture();
			OgreFramework::getSingletonPtr()->updateOgre(uFrameTotalTime/1000.0f);

			if(nMaxGPUQuery==0) // querying the GPU command buffer is disabled
			{
				// render the frame
				OgreFramework::getSingletonPtr()->m_pRoot->renderOneFrame();
			}
			else								// querying the GPU command buffer is enabled
			{
				// increment the buffer.  
				nNewQuery=nOldQuery ;
				nOldQuery++ ;
				if(nOldQuery==nMaxGPUQuery)
					nOldQuery=0 ;


				// define query beginning for this frame
				pHOQ[ nNewQuery ]->beginOcclusionQuery() ;

				// render the frame
				OgreFramework::getSingletonPtr()->m_pRoot->renderOneFrame();

				// define query end for this frame
				pHOQ[ nNewQuery ]->endOcclusionQuery() ;

				// pull query for a prior frame.  Flushes GPU command buffer up to the end of a prior frame but no further.
				pHOQ[ nOldQuery ]->pullOcclusionQuery(&dummy) ;
			}


			// calculate frame time.
			uFrameTotalTime=OgreFramework::getSingletonPtr()->m_pTimer->getMicroseconds()-uFrameStartTime ;

		}
		else
		{
			Sleep(1000);
		}
	}
	
	if(nMaxGPUQuery>0) // if necessary, clean up our HOQ queries and renderQueueListener
	{
	
		for(nNewQuery=0 ; nNewQuery<nMaxGPUQuery ; nNewQuery++)
			OgreFramework::getSingletonPtr()->m_pRoot->getRenderSystem()->destroyHardwareOcclusionQuery( pHOQ[nNewQuery] ) ;
		
		OgreFramework::getSingletonPtr()->m_pSceneMgr->removeRenderQueueListener(rqListener) ;
		delete rqListener ;
	}

	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Main loop quit");
	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Shutdown OGRE...");
}

"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
palrb
Gnoblar
Posts: 8
Joined: Thu May 08, 2008 12:37 pm

Re: Flush GPU command buffer to stop input lag.

Post by palrb »

I am having a problem that acts somewhat similar to this, but not quite... I have tried the suggestions i this thread (the flushing) without any luck. I have a big scene (something like 2,5 million triangles (the Triangle count reported by ogre), and I have a very variable framerate. It varies to the extent that the rendering is more or less unusable for the purpose it was meant for (my team is developing a generic simulator software platform). I have tested on different computers, all running Nvidia cards, mine is a Quadro FX 1600M (the others we tried are far more powerful cards). I have disabled all proprietary shaders, we have a controlled main loop running in a managed thread, in a WinForm application. We have tried fullscreen, with and without vsync, etc etc.

We made a simple grapher to show the temporal variation of the duration of different method calls, and found that the call to renderOneFrame is the one that causes the lagging effects we are struggling with. The vertical axis on the images are milliseconds duration, the y axis is frame number (don't mind the horizontal axis numbers, they are never updated, always 0-100).

This first image shows the duration of the main render loop and call to RenderOneFrame when no scene content is visible in the viewport.
3.jpg
3.jpg (95.98 KiB) Viewed 23576 times
This is what we get when the whole scene is in the viewport. Needless to say, the variation in duration of the renderloop call is far to big. It makes the whole loop very laggy, including camera movement, object movement and it's also affecting physics simulation in a bad way (especially with occational spikes).
2.jpg
2.jpg (110.18 KiB) Viewed 23576 times
Occationally, the scenario depicted below will take place. Even though we have a constant viewport (not moving any cameras etc), the duration of the call has sudden drastic drops that can last up to at least 1-2 seconds, before returning back up.
1.jpg
1.jpg (104.47 KiB) Viewed 23576 times
The screen caps of the TimerGraph logger are done during "normal rendering conditions" (not just after loading a scene). It does not behave better for smaller scenes.

I am open to any suggestion... :)
palrb
Gnoblar
Posts: 8
Joined: Thu May 08, 2008 12:37 pm

Re: Flush GPU command buffer to stop input lag.

Post by palrb »

Some update on this one, it seems the problem appears when having models > 3-400.000 triangles. I am using the generic scene manager. Could this be relevant?
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Re: Flush GPU command buffer to stop input lag.

Post by mkultra333 »

I can't help you palrb, except to say that it doesn't appear to be a GPU command buffer issue, at least not in the manner that caused my problems.
_____________________________________________________________________________________

I wanted to add something, a note on the license of the above code. I hate it when people put code on the web and then don't include license info, since it makes it more or less unusable for me. But I just realized I did it myself. :oops:

Feel free to use the code any way you see fit, commercial or otherwise, and you don't need to credit me or otherwise re-imburse me. It is public domain code. The code is offered "as is" with no warranty or guarantee of fitness.
"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by sinbad »

Yeah, this is not the same issue - sounds like your vertex load is just too high.

@mkultra333: I wrote up a simple class you can add to any sample to use the HOQ trick over N frames. I currently don't have a case which suffers from this stall, I'd be grateful if someone would test it for me to see if it fixes the issue (I've tested it to make sure it 'works', but I can't see the effect). http://www.ogre3d.org/wiki/index.php/FlushGPUBuffer
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Re: Flush GPU command buffer to stop input lag.

Post by mkultra333 »

I'll get a chance to test it on the weekend.

One thing, the description in the wiki article doesn't sound as if it is describing my problem. I didn't suffer from an inconsistent frame rate, it was pretty steady, it was just that the input was lagging. If I searched for the keywords I used when first trying to find solutions, such as "laggy input," "mouse lag", "sluggish controls," "slow keyboard response," then I don't think I'd find your article. I'm not sure I'd recognize it as addressing my problem even if I did find it.

The primary issue as I saw it was nothing to do with a frame rate problem, but a control problem that seemed to occur at low frame rates.
"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Re: Flush GPU command buffer to stop input lag.

Post by mkultra333 »

Just tried it out.

There's a syntax error in the header, you've forgotten the closing "}" before #endif that matches the one after "namespace Ogre"

I fixed that and changed my loop to use it, but it doesn't seem to work. At 25 fps, my original code works fine but your class doesn't seem to make any difference at all, the lag is back and uneffected by different values for the buffer numbers. Here's how I added it to my loop.

Code: Select all

void DemoApp::runDemo()
{
	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Start main loop...");

	Ogre::GpuCommandBufferFlush mBufferFlush;
	int numberOfQueuedFrames=1 ;
	bool GPUBufferSetupDone=false ;
	
	UINT uFrameStartTime=OgreFramework::getSingletonPtr()->m_pTimer->getMilliseconds();
	UINT uFrameTotalTime=0 ;

	OgreFramework::getSingletonPtr()->m_pRenderWnd->resetStatistics() ;
	


	while(!m_bShutdown && !OgreFramework::getSingletonPtr()->isOgreToBeShutDown()) 
	{
		if(OgreFramework::getSingletonPtr()->m_pRenderWnd->isClosed())m_bShutdown = true;

#if OGRE_PLATFORM == OGRE_PLATFORM_WIN32
			Ogre::WindowEventUtilities::messagePump() ;
#endif	



		if(OgreFramework::getSingletonPtr()->m_pRenderWnd->isActive())
		{

			if(GPUBufferSetupDone==false) // I added it here because I assume I can be very sure there's an active render window by now.
			{
				GPUBufferSetupDone=true ;
				mBufferFlush.start(numberOfQueuedFrames);
			}

			// get start time of frame
			uFrameStartTime=OgreFramework::getSingletonPtr()->m_pTimer->getMicroseconds() ;
				
			// update input and physics
			OgreFramework::getSingletonPtr()->m_pKeyboard->capture();
			OgreFramework::getSingletonPtr()->m_pMouse->capture();
			OgreFramework::getSingletonPtr()->updateOgre(uFrameTotalTime/1000.0f);


				// render the frame
			OgreFramework::getSingletonPtr()->m_pRoot->renderOneFrame();
			


			// calculate frame time.
			uFrameTotalTime=OgreFramework::getSingletonPtr()->m_pTimer->getMicroseconds()-uFrameStartTime ;

		}
		else
		{
			Sleep(1000);
		}
	}
	
	mBufferFlush.stop() ;

	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Main loop quit");
	OgreFramework::getSingletonPtr()->m_pLog->logMessage("Shutdown OGRE...");
}
I tried calling start() just after int numberOfQueuedFrames=1 ; as well but that didn't help.

BTW, I'm thinking of making my code dynamic, so that it doesn't use any buffers if the framerate is below 60 but starts to add a few buffers as the framerate gets higher.
"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by sinbad »

Ah yes, that was a bad copy & paste.

Hrm, so I wonder what I did differently to your code? The intention was to recreate what you did in a more generic way.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by sinbad »

Aha - I got it. The problem was that the stop() call during start() was queueing the removal of the FrameListener, so in fact the addFrameListener was ignored. I've fixed that now. I've managed to recreate the issue with a windowed, non-VSync demo with quite heavy scene content, and this class does fix the problem.

I've added a note about lagging input to the wiki page, but most people I've heard reporting this find that it's the frame rate stuttering, usually on a regular, periodic basis (like every second).
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Re: Flush GPU command buffer to stop input lag.

Post by mkultra333 »

Maybe it's the listener. The original uses RenderQueueListener, but your code seems to just use FrameListener.

Edit: Didn't see your last post, I'll try the new code.
"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Re: Flush GPU command buffer to stop input lag.

Post by mkultra333 »

Updated code works fine. :)
"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by sinbad »

I've managed to create a number of situations (mostly D3D9 windowed mode with fairly heavy scenes) where this class makes frame rates far more consistent. Previously I'd dealt with this via Root::setFrameSmoothingPeriod, but with this class I don't need to use that anymore. I think I may promote this to the core and enable it by default because a lot of people may not realise what the problem is, or find this thread / the wiki article to solve it.
Shimayama
Halfling
Posts: 74
Joined: Sat Apr 25, 2009 2:20 pm
Location: Norway
x 1

Re: Flush GPU command buffer to stop input lag.

Post by Shimayama »

I am continuing on palrb's post, since we're working on the same project and we still haven't found a solution to the frame-rate "hickups". It seems that in a somewhat regularly pattern the frame-rate suddenly increases for about just one frame, which produces a very ugly hickup. We've tried most of the solutions posted on the forum, but with no results. (The problem is not a low frame rate, but that it varies so much at some frames)

We've created a simple test scene with some highpoly spheres (about 600000 triangles * 8 spheres). If we load the test scene in some of the Ogre demos, it seems to run smoothly, but in our application we still have the hickups. We've tried running it in Ogre's own rendering window, with either the .StartRendering(), RenderOneFrame() or .UpdateRenderTarget() for rendering, but still with the same results.

Can the fact that our application is wrapped into C#, or that we're using Ofusion for scene loading, be a source of our problem?
Any suggestions would be appretiated, since we're almost running out of ideas.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by sinbad »

If it doesn't have the hiccup in the OGRE demos, and if you're rendering the same scene, then that does point to whatever else is going on in your app and not OGRE. Have you tried profiling to examine what's happening on the CPU when the performance dips?
User avatar
pjcast
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 2543
Joined: Fri Oct 24, 2003 2:53 am
Location: San Diego, Ca
x 2
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by pjcast »

With a managed language, you could be seeing the hiccup as a result of a garbage collection cycle... depends if you are allocating/destroying a lot of heavy objects often.
Have a question about Input? Video? WGE? Come on over... http://www.wreckedgames.com/forum/
palrb
Gnoblar
Posts: 8
Joined: Thu May 08, 2008 12:37 pm

Re: Flush GPU command buffer to stop input lag.

Post by palrb »

pjcast wrote:With a managed language, you could be seeing the hiccup as a result of a garbage collection cycle... depends if you are allocating/destroying a lot of heavy objects often.
Yes, this is a good point. We did look into this, trying a couple of methods to force garbage collection at every frame etc - but with no noticeable effect. Do you have any tips on how to alter the behaviour of the garbage collector to fit better with a demanding render loop (such as a force garbagecollect that actually works)?
User avatar
DanielSefton
Ogre Magi
Posts: 1235
Joined: Fri Oct 26, 2007 12:36 am
Location: Mountain View, CA
x 10
Contact:

Re: Flush GPU command buffer to stop input lag.

Post by DanielSefton »

Tried adding this to my project. It works fine with OpenGL, but I get the following error with DirectX:
OGRE EXCEPTION(3:RenderingAPIException): End occlusion called without matching begin call !! in D3D9HardwareOcclusionQuery::endOcclusionQuery at ..\..\..\RenderSystems\Direct3D9\src\OgreD3D9HardwareOcclusionQuery.cpp (line 103)
All I've added is this after the RenderWindow:

Code: Select all

Ogre::GpuCommandBufferFlush bufferFlush;
bufferFlush.start();
Shimayama
Halfling
Posts: 74
Joined: Sat Apr 25, 2009 2:20 pm
Location: Norway
x 1

Re: Flush GPU command buffer to stop input lag.

Post by Shimayama »

It seems that if the GPU command buffer flush is activated the application crashes on window resize (using a windows form).

Is there some way to modify the GpuCommandBufferFlush class to detect and recover from this error internally, without having to modify the external code to stop and start the flush manually?
User avatar
mkultra333
Gold Sponsor
Gold Sponsor
Posts: 1894
Joined: Sun Mar 08, 2009 5:25 am
x 114

Re: Flush GPU command buffer to stop input lag.

Post by mkultra333 »

The code in the wiki doesn't seem to work anymore, although when I tested it back in June last year it was ok.

I ended up sticking to my own version (NOT the wiki version), however last week I discovered it crashed on someone elses computer in D3D, with the same problem as Daniel Sefton posted above when he used the wiki code:

Code: Select all

13:03:58: OGRE EXCEPTION(3:RenderingAPIException): End occlusion called without matching begin call !! in D3D9HardwareOcclusionQuery::endOcclusionQuery at ..\..\..\..\..\RenderSystems\Direct3D9\src\OgreD3D9HardwareOcclusionQuery.cpp (line 102)
So both my old version and the new version seem to have the same problem.

But on top of that, I just added the wiki version back to my project (not knowing about Daniel Sefton's post, I thought maybe it would fix the crash). What I've found is that when it runs on my computer now, it fails to fix the input lag. Seems to have no effect at all.

So the wiki version seems to either have no effect (my computer) or crashes (Daniel Sefton's computer).

The lack of effect on my computer is a bit of mystery, since it worked when I first tested it... very curious. The only thing I can think of is that something changed when I updated to the latest August 2009 DX9 runtime.

Edit: Ah, forgot the obvious. I also updated to Ogre 1.7. I have no idea if this is an issue in 1.6.3 which is what I used to run.
"In theory there is no difference between practice and theory. In practice, there is." - Psychology Textbook.
Post Reply