New InstanceManager: Instancing done the right way

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
User avatar
masterfalcon
OGRE Team Member
OGRE Team Member
Posts: 4270
Joined: Sun Feb 25, 2007 4:56 am
Location: Bloomington, MN
x 126

Re: New InstanceManager: Instancing done the right way

Post by masterfalcon »

Ok, I can close them. I'm not sure if I can admin though.

This is the one that you didn't apply, correct? http://sourceforge.net/tracker/?func=de ... tid=302997
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5429
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1337

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

Indeed, only a small portion of that patch was applied. The other one (this one) was completely applied.
User avatar
m2codeGEN
Halfling
Posts: 52
Joined: Tue Apr 26, 2011 9:13 am
Location: Russia, Tver
x 2

Re: New InstanceManager: Instancing done the right way

Post by m2codeGEN »

I close it (ID: 3483836)
Arkiruthis
Gremlin
Posts: 178
Joined: Fri Dec 24, 2010 7:55 pm
x 10

Re: New InstanceManager: Instancing done the right way

Post by Arkiruthis »

Out of curiosity, what kind of results are people getting for the new InstanceManager on Mac OSX? (I'm using the latest 1.8, built on OSX, 32bit, i386)

I currently have 500 "robot.mesh" entities in a Bullet physics environment all wandering about and these are the results I'm seeing. (bearing in mind that OSX - or at least my 2010 15" Macbook - doesn't seem to support the basic hardware accelerated modes)

[edit - additional info, entities are moving and animated, the non-instanced via scene nodes, the instanced directly]

Non-instanced:
FPS: 15.1
Tri: 154,606
Batches: 514

Instanced (ShaderBased)
FPS: 19.0
Tri: 154,606
Batches: 139

Instanced (TextureVTF)
FPS: 20.9
Tri: 154,604
Batches: 19

Oddly enough, the non-instanced mode is fastest if you zoom in to an area that has, say, 50-100 entities in it. The TextureVTF one being the worst, but then again, it remains consistent at 19 batches which is good.

I set the InstanceManager to 100 per-batch, I've no idea quite what value (if it's relevant) is advisable to put in here?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5429
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1337

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

Hi, performance gains are very "it depends".

You get faster performance when you zoom in with no instancing because Ogre can cull better, while with instancing on, your GPU has still to process large quantities of unnecessary vertices.

HW Instancing is a lot better at culling, so it also gets better when zooming in; but you don't have those techniques available :(

TextureVTF consumes quite some bit of bandwidth (both bus & GPU). A desktop PC handles it very well, and very fast. But tests show that laptops don't have quite enough bus bandwith. Even powerful notebooks struggle with large number of entities with TextureVTF, making it possibly the worst technique on those machines.

The HW VTF technique eliminates that downside and is almost always the winner in performance.
Bearing in mind that OSX - or at least my 2010 15" Macbook - doesn't seem to support the basic hardware accelerated modes
What GPU do you have?
Try updating your video card drivers. HW Instancing is very old (like 6 years old, or more?), but it has only been added to OpenGL very recently (was it last year I think?). So it's very possible your card supports HW instancing but you need newer drivers.
I set the InstanceManager to 100 per-batch, I've no idea quite what value (if it's relevant) is advisable to put in here?
"It depends".
Number of entities to be looking at:
Ideally it should be the size of the amount of entities that will be in the scene the whole time. So, if you have 1000 entities but you'll be constantly looking at 50 at the same time, use 50. But it will become worse if you suddenly look at the 1000 entities.

Fragmentation:
Take into account, that you may end up looking at 20 entities from one batch, 20 from another batch, and 10 from a third batch. That's three batches. The GPU has to parse 3 x 50 = 150 models. We call that fragmentation. Create your instances in order so they are kept close together.
Most likely if you raise batch size to 150, the likelihood of experiencing this problem will be decreased.
You can debug fragmentation by turning on the Batches' bounding box (the fewer overlaps, the better) by calling:

Code: Select all

InstanceManager::setSetting( SHOW_BOUNDINGBOX, true );
The aid of tools like NVPerfHUD or Pix is highly recommended for debugging fragmentation (unfortunately not available for Mac)
There's a function called "Defragment Batches" that will automatically reassign instanced entities to existing batches, alleviating this problem. Call this function at some intervals (not every frame, it's slow) if you have moving objects that became so mixed to the extent that fragmentation becomes a problem.

Where the Bottleneck is:
If your game is GPU bound, lower the batch count.
If your game is CPU bound, increase the batch count.

So, the number of batches depends on a lot of factors. Best advise is "try it, profile". That's why the 'New Instancing' demo has so many options: It allows you to test your own typical scenario.
Place the camera at the point where you see the amount of instance you would be seeing and play with the parameters.

Cheers
Dark Sylinc
Arkiruthis
Gremlin
Posts: 178
Joined: Fri Dec 24, 2010 7:55 pm
x 10

Re: New InstanceManager: Instancing done the right way

Post by Arkiruthis »

dark_sylinc: Thanks for the info, much appreciated! I switched on the debug boxes to see how it did things.
dark_sylinc wrote: HW Instancing is a lot better at culling, so it also gets better when zooming in; but you don't have those techniques available :(
I'm embarrassed to say it, but I was certain this macbook had a decent graphics card (ATI, etc.) but it turns out it just has an Intel HD which explains everything!! It seems to handle shaders okay, so that will explain the decent TextureVTF for GPU but as you say, bandwidth will be an issue.

The CPU being the bottleneck, I increased the batch count and it seems to have given it a little boost! To be honest, I'm not needing the new instancing for anything critical, I'm just fascinated in it's development and interested in testing it at every opportunity. As ever, thanks for your work on this. :)
User avatar
m2codeGEN
Halfling
Posts: 52
Joined: Tue Apr 26, 2011 9:13 am
Location: Russia, Tver
x 2

Re: New InstanceManager: Instancing done the right way

Post by m2codeGEN »

Even at nVidia GeForce GTX 480 the problem of support of the hardware instansing in drivers (ARB_instanced_arrays) meets. In 19Х drivers they didn't work. 25Х - everything is good.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5429
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1337

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

Indeed, it's almost always a driver issue in this case using OpenGL.
Also, watch out if your Mac doesn't have those dual GPU, which switches between ATI/NVIDIA & Intel GPUs to save battery.

I've heard the ATI/NVIDIA kicks in only if the App is digitally signed (aka. 'certified to drain your battery & cause a lot of heat') or you've explicitly enabled your own apps in the driver's control panel.
Transporter
Minaton
Posts: 933
Joined: Mon Mar 05, 2012 11:37 am
Location: Germany
x 110

Re: New InstanceManager: Instancing done the right way

Post by Transporter »

Gdlk wrote:The question is How can I generate a mesh supported by the instanceManager (i.e. without shares vertices) ?, I had all the system ready to work, and with the robot mesh work great!, but with my mesh (exported from blender) the system dont work. With the debug I found that the shares vertices line is where crash it =( (the robot mesh has set in false, my mesh in true)
Same problem here! I've built a simple cube with blender and exported it to Ogre. It's crashing at OgreInstanceBatchHW_VTF.cpp line 356:

Code: Select all

if( baseSubMesh->vertexData->vertexDeclaration->getNextFreeTextureCoordinate() > 8 - neededTextureCoord )
vertexData is a null pointer. How can I create correct meshs? How do you create this meshs? 3dsmax?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5429
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1337

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

How did you manage to use the blender exporter with shared vertices? The official Blender exporter exports to non-shared sections...
Transporter
Minaton
Posts: 933
Joined: Mon Mar 05, 2012 11:37 am
Location: Germany
x 110

Re: New InstanceManager: Instancing done the right way

Post by Transporter »

dark_sylinc wrote:How did you manage to use the blender exporter with shared vertices? The official Blender exporter exports to non-shared sections...
http://www.ogre3d.org/forums/viewtopic.php?f=8&t=61485
blender2ogre-0.5.6preview7.zip
Zamx
Halfling
Posts: 54
Joined: Sat Feb 14, 2009 5:40 pm

Re: New InstanceManager: Instancing done the right way

Post by Zamx »

manituan wrote:Hi. I really appreciate your work.
I´m implementing your sample in my Ogre 1.8 project and I only see one mesh, but in the triangle count appears 9 million of them. The engine doesn´t show me them.
Do you know what it´s going on? Thanks.
I'm having the same problem, using HWinstancing. I tested it with your shaders from the samples and also debugged the function InstanceBatchHW::updateVertexBuffer where the vertexbuffer is filled correctly with the transformed matrices. So I think the problem is the link between the vertexbuffer of matrices and the shaders.

I see you pass the matrices with a vertexbuffer. The vertexbuffer is created in the function InstanceBatchHW::setupVertices :

Code: Select all

      size_t offset				= 0;
		unsigned short nextTexCoord	= thisVertexData->vertexDeclaration->getNextFreeTextureCoordinate();
		const unsigned short newSource = thisVertexData->vertexDeclaration->getMaxSource() + 1;
		for( int i=0; i<3; ++i )
		{
			thisVertexData->vertexDeclaration->addElement( newSource, offset, VET_FLOAT4,
															VES_TEXTURE_COORDINATES, nextTexCoord++ );
			offset = thisVertexData->vertexDeclaration->getVertexSize( newSource );
		}

		//Create the vertex buffer containing per instance data
		HardwareVertexBufferSharedPtr vertexBuffer =
										HardwareBufferManager::getSingleton().createVertexBuffer(
										thisVertexData->vertexDeclaration->getVertexSize(newSource),
										mInstancesPerBatch,
										HardwareBuffer::HBU_STATIC_WRITE_ONLY );
		thisVertexData->vertexBufferBinding->setBinding( newSource, vertexBuffer );
But I bind my vertexbuffers to mesh like this :

Code: Select all

        vertexDecl->addElement(0, 0, Ogre::VET_FLOAT3, Ogre::VES_POSITION);
        vertexDecl->addElement(1, 0, Ogre::VET_FLOAT3, Ogre::VES_NORMAL);
        vertexDecl->addElement(2, 0, Ogre::VET_FLOAT2, Ogre::VES_TEXTURE_COORDINATES, 0);
Could this be a problem for the shader to get the correct world matrix;

P.S. : Found a small bug in OgreBatchInstance, because of inheriting for MovableObject and Renderable the function setUserAny is ambiguous.
Frederic66
Gnoblar
Posts: 4
Joined: Wed Nov 02, 2011 1:36 pm

Re: New InstanceManager: Instancing done the right way

Post by Frederic66 »

How to test the instancing demo with all the latest changes?
************************************************************************
Dear Ogre Community,

This is my first post here at Ogre forums. Let me begin by thanking all the tireless contributors here for the splendid work they have put in over the years.
I am working on an indie RTS game and have been following this thread with acute interest. I would like to test the instancing demo on my system with all the latest changes. Hope I could get some pointers.

First things first:
System specs: Win XP , Radeon HD 7XXX video card, 4G RAM.
Dev Environment : Visual Studio Express 2008

What I have done so far:
Have installed all the prereqs ( Boost, CMake).
Have downloaded and built all the Ogre dependencies.
Have downloaded the V 1.7-4 Ogre SDK. Have built the Ogre demos in DEBUG and RELEASE mode. All demos working OK.
Have downloaded the V 1.7-4 Ogre source. Have built the Ogre demos in DEBUG and RELEASE mode. All demos working OK.

My qn:
If I wanted to test the Ogre instancing demo with all the latest changes by Mattan Furst and m2CodeGEN, how should I go about?
Is there a source code package that is meant just for instancing test?
Or do I downlaod a particular unstable branch ( say 1.8 ) and build from it?

Any tips will be highly appreciated.
Thanks in advance and best regards

Fred
Mentol
Gnoblar
Posts: 4
Joined: Thu Apr 05, 2012 10:02 am

Re: New InstanceManager: Instancing done the right way

Post by Mentol »

Frederic66 wrote:Or do I downlaod a particular unstable branch ( say 1.8 ) and build from it?
Yep, you should get the latest source code from Mercurial from here: https://bitbucket.org/sinbad/ogre/src/

That contains the NewInstancing demo which uses the hardware instancing.

Now a question of my own :)

I would like to convert the NewInstancing demo to use HLSL, just the HW+VTF technique which is the one I'm interested in. I'm an Ogre noobie, so first, is it possible and straightforward? If so, how would I go about converting this, for example?
vertex_program Ogre/Instancing/HW_VTF_cg_vs cg
{
source HW_VTFInstancing.cg
entry_point main_vs
profiles vs_3_0 vp40

compile_arguments -DDEPTH_SHADOWRECEIVER

uses_vertex_texture_fetch true
}
I tried replacing cg for hlsl, and profiles with target, but target only accepts one parameters, so I'm not sure what to do...
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: New InstanceManager: Instancing done the right way

Post by bstone »

Use vs_x_x and ps_x_x as your target. The other alternative profiles are for targeting OpenGL.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5429
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1337

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

Zamx wrote:
manituan wrote:Hi. I really appreciate your work.
I´m implementing your sample in my Ogre 1.8 project and I only see one mesh, but in the triangle count appears 9 million of them. The engine doesn´t show me them.
Do you know what it´s going on? Thanks.
Unless I'm missing something I don't see anything wrong in the declaration & the code. It would help to have a method to reproduce the bug (a text exe with src, may be?)

Beware that I've seen similar cases which where caused by incorrect shaders; mostly because the model had more weights per vertex than the shader expects.
Try first using IM_FORCEONEWEIGHT and the default shaders to see if the problem persists. Tools like PIX or NVPerfHud may help you looking what the final declaration looks like. DirectX Debug Runtimes may help diagnosing the problem
Zamx wrote: P.S. : Found a small bug in OgreBatchInstance, because of inheriting for MovableObject and Renderable the function setUserAny is ambiguous.
Thanks, writing "using MovableObject::setUserAny;" inside the class should fix it IIRC. Will look into it later.
Mentol wrote:I tried replacing cg for hlsl, and profiles with target, but target only accepts one parameters, so I'm not sure what to do...
Conversion is straightforward :)
Use "profiles ps_3_0" & "profiles vs_3_0". "vp40" & "fp40" are OpenGL profiles, hence the error.
If it doesn't work correctly after that, you may need to play with column_major_matrices and note that

Code: Select all

compile_arguments -DMYMACRO -DExample
translates to

Code: Select all

preprocessor_defines MYMACRO=1,Example=1
Cheers
Dark Sylinc
User avatar
m2codeGEN
Halfling
Posts: 52
Joined: Tue Apr 26, 2011 9:13 am
Location: Russia, Tver
x 2

Re: New InstanceManager: Instancing done the right way

Post by m2codeGEN »

It`s mistake

It is a little more feedback for my part.
It seems to me, it is necessary to pay attention to methods InstanceBatch*::setStaticAndUpdate

Code: Select all

void InstanceBatchHW::setStaticAndUpdate( bool bStatic )
{
   //We were dirty but didn't update bounds. Do it now.
   if( mKeepStatic && mBoundsDirty )
      mCreator->_addDirtyBatch( this );

   mKeepStatic = bStatic;
   if( mKeepStatic )
   {
      //One final update, since there will be none from now on
      //(except further calls to this function). Pass NULL because
      //we want to include only those who were added to the scene
      //but we don't want to perform culling
      mRenderOperation.numberOfInstances = updateVertexBuffer( 0 );
   }
}
I think that the additional call of the _updateBounds method if the Batch is static is necessary. As the scene graph won't be cares about dirty Batch

Code: Select all

void InstanceBatchHW::setStaticAndUpdate( bool bStatic )
{
   //We were dirty but didn't update bounds. Do it now.
   if( mKeepStatic && mBoundsDirty )
      mCreator->_addDirtyBatch( this );

   mKeepStatic = bStatic;
   if( mKeepStatic )
   {
      _updateBounds();                                                       // <----------------------------------------- m2codeGEN edit
      //One final update, since there will be none from now on
      //(except further calls to this function). Pass NULL because
      //we want to include only those who were added to the scene
      //but we don't want to perform culling
      mRenderOperation.numberOfInstances = updateVertexBuffer( 0 );
   }
}
And for HW_VTF

Code: Select all

void InstanceBatchHW_VTF::setStaticAndUpdate( bool bStatic )
{
   //We were dirty but didn't update bounds. Do it now.
   if( mKeepStatic && mBoundsDirty )
      mCreator->_addDirtyBatch( this );

   mKeepStatic = bStatic;
   if( mKeepStatic )
   {
      _updateBounds();                                                       // <----------------------------------------- m2codeGEN edit
      //One final update, since there will be none from now on
      //(except further calls to this function). Pass NULL because
      //we want to include only those who were added to the scene
      //but we don't want to perform culling
      mRenderOperation.numberOfInstances = updateVertexTexture( 0 );
   }
}
Last edited by m2codeGEN on Thu Apr 12, 2012 7:27 am, edited 1 time in total.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5429
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1337

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

This piece of the code:

Code: Select all

//We were dirty but didn't update bounds. Do it now.
if( mKeepStatic && mBoundsDirty )
   mCreator->_addDirtyBatch( this )
Ought to handle the problem as later on InstanceBatch::_updateBounds() will be called. So why are you saying it's not working?
User avatar
m2codeGEN
Halfling
Posts: 52
Joined: Tue Apr 26, 2011 9:13 am
Location: Russia, Tver
x 2

Re: New InstanceManager: Instancing done the right way

Post by m2codeGEN »

It is my mistake. We use the patched manager.
http://sourceforge.net/tracker/?func=de ... tid=302997

Therefore before updating GPU resource actual value position of parent SceneNode is necessary (it`s calc in _updateBounds).
But nevertheless thanks for the answer. The _updateBounds method was caused then two times and the second call any more won't be necessary.
Zamx
Halfling
Posts: 54
Joined: Sat Feb 14, 2009 5:40 pm

Re: New InstanceManager: Instancing done the right way

Post by Zamx »

dark_sylinc wrote:
Zamx wrote:
manituan wrote:Hi. I really appreciate your work.
I´m implementing your sample in my Ogre 1.8 project and I only see one mesh, but in the triangle count appears 9 million of them. The engine doesn´t show me them.
Do you know what it´s going on? Thanks.
Unless I'm missing something I don't see anything wrong in the declaration & the code. It would help to have a method to reproduce the bug (a text exe with src, may be?)

Beware that I've seen similar cases which where caused by incorrect shaders; mostly because the model had more weights per vertex than the shader expects.
Try first using IM_FORCEONEWEIGHT and the default shaders to see if the problem persists. Tools like PIX or NVPerfHud may help you looking what the final declaration looks like. DirectX Debug Runtimes may help diagnosing the problem
I have found my problem, it is quite embarrassing. In our shaders we had

Code: Select all

worldmatrix * vertex
, but it should have been.

Code: Select all

vertex * worldmatrix
Now another question, currently instancing only works for triangles. But we are also using lines to present wireframes. Is this something you are working or is this something I could do?
User avatar
m2codeGEN
Halfling
Posts: 52
Joined: Tue Apr 26, 2011 9:13 am
Location: Russia, Tver
x 2

Re: New InstanceManager: Instancing done the right way

Post by m2codeGEN »

I have found my problem, it is quite embarrassing. In our shaders we had

Code: Select all

worldmatrix * vertex
, but it should have been.

Code: Select all

vertex * worldmatrix
Instance manager send to GPU row major matrices. So multiply order is correct.
User avatar
m2codeGEN
Halfling
Posts: 52
Joined: Tue Apr 26, 2011 9:13 am
Location: Russia, Tver
x 2

Re: New InstanceManager: Instancing done the right way

Post by m2codeGEN »

With nedmalloc 1 InstancedEntity allocate 1024 bytes of memory.
1 million treas ~ 1GB of RAM :evil:

I thinking about reimplement InstanceSystem
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: New InstanceManager: Instancing done the right way

Post by bstone »

Try allocating more triangles per instance and you should be fine me thinks.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5429
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1337

Re: New InstanceManager: Instancing done the right way

Post by dark_sylinc »

m2codeGEN wrote:With nedmalloc 1 InstancedEntity allocate 1024 bytes of memory.
1 million treas ~ 1GB of RAM :evil:

I thinking about reimplement InstanceSystem
Your case is quite a specific & particular one. It's interesting to see your results.
It looks to me InstancedEntity needs a rewrite for your case, one in which they don't inherit from MovableObject (this choise was made for convenience for Ogre users which are already familiar with it; plus polyphormism benefits where swapping between Entities & InstancedEntities would be eased) and only position & orientation is stored, no cached transform matrices at all (instead, compute them on the fly before updating the vertex texture/vertex buffers/constant registers)

Still, a kb per (instanced)entity is another red flag about why we need to start on to Ogre 2.0 sooner rather than later.
User avatar
m2codeGEN
Halfling
Posts: 52
Joined: Tue Apr 26, 2011 9:13 am
Location: Russia, Tver
x 2

Re: New InstanceManager: Instancing done the right way

Post by m2codeGEN »

I think result won't keep itself waiting long as next week it is necessary to give the version on an alpha testing.
At current time I encode a position, yaw rotation, scale and height on each tree in 8 bytes.