So, as a summary the code to setup the instances would be like this:
Code: Select all
InstanceManager *instanceManager = m_sceneMgr->createInstanceManager();
instanceManager->setMeshName( "myOriginalMesh.mesh" );
instanceManager->setMaterialName( "instancedMaterial.material" );
instanceManager->setInstancingMethod( InstanceManager::ConstantRegisters /*or InstanceManager::VertexBufferSource*/ );
instanceManager->setIdealInstancesPerBatch( 50 ); //Similar to std::vector::reserve()
InstancedEntity *instancedEntity = instanceManager->createInstancedEntity( "uniqueInstanceName" );
sceneNode->attachObject( instancedEntity );
/*instancedEntity->getWorldBoundingBox() works like a regular Entity::getWorldBoundingBox()*/
Alternatively, you could call:
Code: Select all
InstanceManager *instanceManager = sceneMgr->createInstanceManager( "myOriginalMesh.mesh", "instancedMaterial.material", InstanceManager::ConstantRegisters );
InstancedEntity *instancedEntity = m_sceneMgr->getInstancedManager( "myOriginalMesh.mesh", "instancedMaterial.material", InstanceManager::ConstantRegisters )->createInstancedEntity();
The system will automatically build more batches when more instances than the limit (i.e. usually 80) allows. This should be completely hidden to the developer, just in the same way a C++ programmer doesn't need to worry about reallocating more memory when using std::vector::push_back; yet he can't overlook the performance hit if he doesn't hint through std::vector::reserve how much memory he thinks he will need (to avoid unnecessary reallocations).
This is where InstanceManager::setIdealInstancesPerBatch comes to play. This is usefull when you have an estimate of how much objects will be in your scene on average.
For example let's say you know for sure
there will be always 90 while the hardware limit is 80 per batch.
So the developer creates 90 instances, the InstanceManager would create 2 batches of 160 instances.
70 instances would be wasted. That could mean a lot of vertex shader processing power.
So, in this case, InstanceManager::setIdealInstancesPerBatch( 45 ) is called, so that 2 batches of 45 instances each is called. There are still 2 draw calls (which means no CPU perf gain) but the vertex shader processes 90 instances instead of 160.
However, if he adds a new instance (so that makes 91) a 3rd batch would be needed, which increases CPU usage (3 draw calls are needed) and now 135 instances are processed, which means 44 instances are wasted in vertex shader.
InstanceManager::setIdealInstancesPerBatch would really vary depending on the needs of each application, and is best defaulted at the maximum the hardware allows.
With some instancing methods (i.e. InstanceManager::VertexBufferSource) there is no hardware limit on how many instances per batch (actually, the limit would be 65535 / numMeshVertices if we're working with 16-bit indices) so in this case InstanceManager::setIdealInstancesPerBatch could be mandatory.
PS: May be I should start a new topic? Because this is getting off-topic...
The only limitation I see is that I'm not sure if there's a good way to know how many instances per batch is allowed (in the ConstantRegister method), because it does not only vary with the number of constant regs. but also with the number of uniform parameters (which take away const. registers) used by the coder. Unless there's a clean, portable way which guarantees how many unused const. registers are available, some input from the programmer would be required.
In case you're wondering: I've already tried a system (not specifically for instancing actually) where I use derived classes from MovableObjects so I could use them independently with SceneNodes while actually forming part of a larger system. Works like a charm. However I was using a RenderOperation that was being shared across all sub objects, not a completely empty one. May be using an empty one might crash Ogre, but that could be easily fixed.
Another advantage of this is that these objects get notified when they get "dirty" (i.e. scene node's position was updated) so you can tell the parent InstanceManager to update the constant registers from the batch it belongs to.
Another alternative method to ConstantRegister & VertexBufferSource would be using a 1D texture map with VTF. But that is close to pointless (though interesting once we get running the other 2) since it's only efficient and well supported in G80 and later hardware, which is already DX10-capable, and DX10 comes with 4096 constant registers as opposed to the mere 256 from SM 3.0