Ok, here goes
The issue of texturing in W3D/M is really bound up with broader issues surrounding the poxel engine and the game as a whole. Before W3D, I'd always started with a tech concept and built the game around what that technology was capable of. With W3D, that wasn't an option. Worms already had a very distinct graphical style which needed to be reproduced in 3D.
We looked at the existing marching boxes demos, but although they were very impressive, they didn't actually LOOK any good. All the coders would stand around marvelling at the achievement, and all the artists would stand around saying "Hell, I can't make anything look good using that!" And at the end of the day, an engine only looks as good as the artists can make it look.
It was therefore very quickly apparent that a straightforward voxel lattice wasn't going to work. Plus the game had to run on a 32mb PS2, so setting aside 16mb or so for a voxel array of reasonable resolution was impossible. And as for texturing - well, having a 'material' for each voxel just doesn't cut it: an artist needs to be able to put a door texture on a door, a window texture on a window - they need real control.
Taking all this into account, I replaced the idea of a single voxel lattice with multiple ones, each of which could be positioned and oriented independently. At first, the idea was to have, say, one lattice for a house, another for a tree, with voxels within those lattices used to produce the detailed structure. To facilitate the organic worms style, I also added progressively more deformation options to the lattices: first a heightmap applied across the X/Y plane of the lattice, then two independent shear/taper deformations along the X/Z and Y/Z planes, and finally upgrading the shear/taper deformations to have seperate values for each layer within the lattice, which allowed the artists to create bent, organic shapes. Almost immediately, the 'one lattice per object' notion was tossed out: we ended up with landscapes consisting of 500+ individual lattices, some consisting of just a few highly deformed voxels. Drawing with voxels, you see, is horrible. The artists usually ended up just filling each lattice completely to the brim and then shaping it to fit, with the individual voxels only really coming into play as the landscape was destroyed
The texturing problem was overcome by dividing the generated geometry into top, bottom and side polygons, with a seperate custom texture projection onto each surface. The projection occurred after the heightmap was applied but before the edge-deformation tapering, so brickwork would follow the curved shapes of the walls.
Rather than one material per voxel, we allowed two, and had a per-corner blend value to permit smooth transitions. We also discarded the raw marching boxes algorithm in favour of a bespoke system which optionally crenellated the walls and rolled-off the edges of each lattice - each voxel was essentially divided into four and skinned with polygons accordingly. Finally we added details like grass edges.
As you can imagine, in all this the lovely simple collision detection you're enjoying went completely out of the window. We also had an even worse problem: we couldn't afford to drop frames on the PS2, but at the same time we couldn't regenerate the landscape fast enough if a huge explosion went off. We had to timeslice the regeneration - which meant that we couldn't use the landscape geometry for collision. During regeneration, it would be leaky. We had to use the raw voxel data. Fortunately, the poxel deformations I'd devised were reversible, so you could tell reasonably quickly whether a given point in space was inside a poxel. Surface normal calculation, on the other hand, was indescribably hideous.
Then there was the self-shadowing - again, not something we could just omit because it was hard

Each lattice chunk had a list of shadow dependencies - chunks that needed to be rechecked if that chunk was damaged. On top of that we ended up caching which voxel was occluding each vertex in the landscape, so that after an explosion we could very, very quickly check whether any given vertex was affected.
In W4 Mayhem, I added point light sources. They couldn't overlap, but they cast shadows and all the rest of it - a humongous ball-ache, frankly, but it looked nice. I also added a simple low-res heightmap under the whole scene to provide a cheap, area-filling base upon which landscapes could be constructed.
In short, to get a voxel engine that actually looked like a proper game and wouldn't give the artists an embolism, I had to prioritise the features they needed: control over texturing and shape. Everything else had to be engineered around that. I'm actually really proud of the result - second only to the worm animation system in W4
For that reason, I'd strongly advise you to think about what your engine is going to be used for - and by who - before piling too many features on top.
Anyway, hope that helps a bit.