For the past few weeks, I've been working on a 2d isometric tile engine using Ogre, but I'm having trouble displaying scenes of significant size. I've attempted to address the issue by cutting up the scene with SceneNodes, but the expected boost in performance did not occur. I wanted to see whether perhaps my expectations are wrong or I'm simply not doing things correctly.
First, some information about my environment:
Ogre version: 1.9
OS: Windows
Graphics API: DirectX9
Graphics Card: GeForce GTX 670
IDE: MSVC 2012
The world I'm trying to display is a 3D array where every cell is represented by a 2D tile and looks pretty much as follows:

Each tile is drawn using Instancing, specifically HWInstancingBasic. The terrain layers can also be scrolled through, so even tiles which are not visible are assigned an Ogre::InstancedEntity on scene creation. The terrain is also modifiable, which is why I've used individual tiles rather than a bigger mesh (I'm sure I could modify the mesh at run time, but I wanted to keep it simple, as I'm relatively new to graphics).
The first time I created a world of decent size (64x64x18), I saw that I was getting very low FPS. I therefore subdivided the scene into nodes of 16x16x1, but that did not help much. In fact, it helped very little. Given that a lot of the terrain is hidden (the above image is a bit jaggier than my usual scenes), I was expecting the FPS to increase by quite a bit.
Tracing through Ogre, I noted that the buffer being sent to the card for instancing was only being updated once during the Octree walk (more specifically, InstanceBatchHW::updateVertexBuffer is only called once. And yes, there is only one batch), which made me wonder whether I was not combining SceneNodes and Instancing correctly. Also, I note that when InstanceBatchHW::updateVertexBuffer is called, the code walks through all instances, even those not in the node.
My question is ultimately how should Instancing and SceneNodes be combined in order to maximize my performance. Do I perhaps need to have the InstanceManager create smaller batches and assign the Instances in one batch to one SceneNode? Is my approach to the scene not the best?
Any suggestions, insights are welcome. Let me know if there is any more information I can provide.

