Home made containers

What it says on the tin: a place to discuss proposed new features.
Post Reply
User avatar
Caius
Halfling
Posts: 52
Joined: Fri Oct 06, 2006 9:00 pm

Home made containers

Post by Caius »

Cheers,

This is not really a feature request, but an optimization suggestion.
I've read long time ago on a french c++ forum, a debate about the standard containers; some guys where arguing that in critical processes,custom containers are profitable to meet the performance needs. The standard containers were designed with security expectations, and requires some more computation.
Also, I had a look at the G3D lib; which does implements it's own containers for that purpose.

I don't know how doable would it be for ogre, should'nt that hard since typedefs are used to rename them, with a similar interface, such containers could improve collection processing.
The idea has probably been already though, maybe there's a reason, I wonder why Ogre wouldn't ? =)

thanks
CABAListic
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 2903
Joined: Thu Jan 18, 2007 2:48 pm
x 58
Contact:

Post by CABAListic »

"long time ago" may be the key word here. Now I won't argue that the STL containers were certainly not designed with games in mind and possibly carry some overhead one could get rid of. But, recent STL implementations are still heavily optimised and written by people with a lot of insight into compilers.

Writing a custom container is not easy, but doable for any slightly (or "slightly"?) more experienced programmer. It's a tedious and possibly error-prone progress, though. But writing one that matches and even exceeds STL's performance? That's a lost challenge for almost all, I'd say. Ok, you could write a hash map and beat std::map, but then, std::tr1::unordered_map is on its way ;)

The one thing where you actually *can* beat the standard containers is memory allocation strategy. But the STL knows this and offers hooks for custom allocators so you only need to supply the allocators, not write your own containers. And IIRC, that's what one of the Google projects is going to get into Ogre.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Post by Praetor »

This has cropped up before. Someone from EA actually sparked a nice discussion. The regular STL has worked really well so far. Check out that discussion here:
http://www.ogre3d.org/phpBB2/viewtopic. ... ight=eastl
User avatar
Caius
Halfling
Posts: 52
Joined: Fri Oct 06, 2006 9:00 pm

Post by Caius »

Very interesting discussion indeed, I've learned some stuff there =)
* Some STL implementations (especially Microsoft STL) have inferior performance characteristics that make them unsuitable for game development. EASTL is faster than all existing STL implementations.
most of us are using microsoft STL right? :wink:

Here is the paper on EASTLhe was talking about
"long time ago" may be the key word here. Now I won't argue that the STL containers were certainly not designed with games in mind and possibly carry some overhead one could get rid of. But, recent STL implementations are still heavily optimised and written by people with a lot of insight into compilers.
yes, my c++ knowledge is quite outdated i must say.
I was googling looking for optimized containers; haven't found any, except this fresh article but it's s very different approach.
Maybe I should have put this post to "back to basics" :lol:

@Praetor: by the way, look at your pm =)
white tiger
Kobold
Posts: 38
Joined: Thu Jul 26, 2007 4:40 pm

Post by white tiger »

I've read long time ago on a french c++ forum, a debate about the standard containers; some guys where arguing that in critical processes,custom containers are profitable to meet the performance needs. The standard containers were designed with security expectations, and requires some more computation.
Also, I had a look at the G3D lib; which does implements it's own containers for that purpose.


Irrlicht use its own containers implementation. They re-write PART of the STL (so no all STL-like containers are available) in their engine because their implementation would be faster then the stander STL

Well i wrote some time ago a converter using irrlicht (before switching to OGRE). It simple convert a mesh loadable by irrlicht to .pov (the POV-ray format, a free raytracer)

It spent 1 minute to convert a "simple" mesh with "few" polys.

To make it short, i cout debug lines and find out it's slow when write triangles data. Then i substitute all irr::stringc with std::string (forgot to mention that POV files are ASCII and I used vector of strings to rappresent it).

It spent some seconds and the conversion was done, a lot faster than using irrlicht STL implementation. I think irrlicht "int to string" conversion is slow. In the new implementation I use

ostringstream ostr;
ostr << int_var;
return ostr.str();

(using the STL) to do the conversion and it's faster. I ended substitute all irr::core::stringc to std::string and all irr::core::array to std::vector

So i think re-writing the STL is only a wasting of time and re-inventing the wheel. And your implementation can be very slower than the default STL , or at least equal.

And in a game what make FPS and performance difference is shaders, physics, AI; surely not the container implementation.
User avatar
Caius
Halfling
Posts: 52
Joined: Fri Oct 06, 2006 9:00 pm

Post by Caius »

I'm not talking about rewriting the STL, but just about containers, and when I say container, I think especially about vector .
The idea came after reading the G3D::Array documentation:

Array is highly optimized compared to std::vector. Array operations are less expensive than on std::vector and for large amounts of data, Array consumes only 1.5x the total size of the data, while std::vector consumes 2.0x. The default array takes up zero heap space. The first resize (or append) operation grows it to a reasonable internal size so it is efficient to append to small arrays. Memory is allocated using System::alignedMalloc, which produces pointers aligned to 16-byte boundaries for use with SSE instructions and uses pooled storage for fast allocation. When Array needs to copy data internally on a resize operation it correctly invokes copy constructors of the elements (the MSVC6 implementation of std::vector uses realloc, which can create memory leaks for classes containing references and pointers). Array provides a guaranteed safe way to access the underlying data as a flat C array -- Array::getCArray. Although (T*)std::vector::begin() can be used for this purpose, it is not guaranteed to succeed on all platforms.
G3d is licenced bsd, so there's already something written out there, that might be adaptable for ogre, and might provide significant performance improvement.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Post by Praetor »

Caius wrote:G3d is licenced bsd, so there's already something written out there, that might be adaptable for ogre, and might provide significant performance improvement.
Be careful with words like "significant" when talking about performance. A common trap is that even if it provides significant performance benefits over std::vector in a simple benchmark, other factors play much larger roles in a large project like Ogre. The performance benefits that at first seemed large can quickly become meaningless, and in many cases not even worth the effort. Performance optimization is a tricky thing. Only if std::vector operations are causing significant slowdowns will replacing it provide significant improvements.
User avatar
Caius
Halfling
Posts: 52
Joined: Fri Oct 06, 2006 9:00 pm

Post by Caius »

Be careful with words like "significant" when talking about performance.
yeah, that's why I've put "might" in front of it, I may should have put "may", "could", "probably" or "I have no idea but on the paper it looks great" :lol:, my english is not that precise also =)
I know I'm far from being an expert when it deals with those low-level subject, but try to bring some ideas I think interresting.
actually I should have put all that in "developper talk", that kind of ideas always implies heavy debates.

curious about what you guys think. :)
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Post by Praetor »

I find it difficult to think that Ogre has not before undergone some rigorous performance analysis. Some searching of the forum would probably yield you with some profiling results. If all else fails you can profile Ogre yourself. I use AMD's free CodeAnalyst profiler. Run some of the demos through it, and it will tell you the pieces of Ogre that take up the most time. Profiling is a great way of seeing yours or other's code in action.
User avatar
Caius
Halfling
Posts: 52
Joined: Fri Oct 06, 2006 9:00 pm

Post by Caius »

I use AMD's free CodeAnalyst profiler
yeah , that's the kind of thing i need :)
works on intel?
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Post by Praetor »

I have no idea. You can get a timed evaluation of Intel's VTune if it doesn't. I run AMD on my desktop, so I haven't had to face that issue before.
User avatar
Caius
Halfling
Posts: 52
Joined: Fri Oct 06, 2006 9:00 pm

Post by Caius »

I'll do some research, thanks for the input.
I find it difficult to think that Ogre has not before undergone some rigorous performance analysis.
by the way , I'm absolutely not criticizing or underestimating Ogre's performance, or anything; I love it as anyone here, I just would be glad to contribute to it.
also I'm a very :idea: type of guy; and I tend throw my thought into the arena maybe too early :lol:
I'm going to do some research on the question.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3
Contact:

Post by Praetor »

Actually, criticize away! It can only lead to helping Ogre, helping you, or both. But I wasn't taking your questions as criticisms anyway. I'm just saying that I think someone must have put Ogre through a profiler or two at some point, and the information may have made its way onto the forum.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Post by sinbad »

We're always open to suggestions, but personally I think the major reason not to use the STL is that consoles have had awful implementations of it in the past. And there are always memory allocation issues which are always a lot tighter on consoles.

Basically our motto is not to reinvent unless there's a damn good reason. Having to write, test and maintain custom containers when the STL is actually incredibly powerful requires a very strong case.

Personally I find this a very odd comment though:
Array provides a guaranteed safe way to access the underlying data as a flat C array -- Array::getCArray. Although (T*)std::vector::begin() can be used for this purpose, it is not guaranteed to succeed on all platforms.
Whilst he is technically correct that std::vector is not guaranteed by the standard to be contiguous area of memory, I have never seen or heard of a single platform where this is not the case, and repeatedly see it referenced as a contiguous block. It would take a seriously insane STL implementor not to preserve the continuity, because everyone assumes this is the case in all implementations.
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 534

Post by Kojack »

Whilst he is technically correct that std::vector is not guaranteed by the standard to be contiguous area of memory
Well, the C++ standard from 2003 says:
23.2.4 Class template vector [lib.vector]
1 A vector is a kind of sequence that supports random access iterators. In addition, it supports (amortized)
constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage
management is handled automatically, though hints can be given to improve efficiency. The elements of a
vector are stored contiguously
, meaning that if v is a vector<T, Allocator> where T is some type
other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().
Bjarne Stroustrup says that it was always intended that way even if old versions of the standard didn't say it, and all implementations he knew of did it that way.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Post by sinbad »

Ah right, I didn't realise they'd added it to the standard, that's good.
User avatar
Caius
Halfling
Posts: 52
Joined: Fri Oct 06, 2006 9:00 pm

Post by Caius »

I've attempted to make a benchmark between G3D's Array and vector, but actually could not figure out how to make G3d's work; cause of linking issues, some libs might not have been precompiled, i don't know... the container has dependencies that have dependencies an so on. I would have had to look how to use the whole lib to use it's array :lol:
However, after thinking a while; I ended follow to Cabalistic's wise point:
The one thing where you actually *can* beat the standard containers is memory allocation strategy.
He's damn right :wink:
I've seen that someone is already on it, wait and see.

about the vector. I was wondering...
the importance that the memory is contiguous, beyond to avoid memory fragmentation, is to iterate faster right?
I'm wondering if a vector of pointer defeats somehow the purpose of the vector. in something like vector<double*>; only the pointers are contiguous, not the pointed values. in this case there's no gain against a linked list right? (except random access). yeah I know it's a "back to basic" question =)
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66
Contact:

Post by sinbad »

It's mostly about cache coherency and being able to set up chunks faster (e.g. memset / memcpy). A vector of pointers loses much of these benefits yes, unless most of the manipulation is done by bulk copying / reorganising those pointers, in which case it can be faster to block copy pointers but keep the instances indepdendent than block copying by-value objects. It's very much a case-by-case thing.
grhodes
Bronze Sponsor
Bronze Sponsor
Posts: 11
Joined: Tue Jun 19, 2007 6:13 pm
Location: Raleigh, NC
Contact:

Post by grhodes »

Sinbad's comment about some consoles not having good STL implementations is interesting...we had a source code license for a commercial game engine, one that works on all of the past and current generations of major game consoles, that implemented its own containers for probably this reason.

Internally, we use STL, but we are very careful about how we use it. Pete Isensee from Microsoft has given a few quite interesting talks @ GDC about C++ optimization and performance, with discussions of Microsoft Game Group's measured STL vs. custom containers. VERY interesting commentary from Pete that really sheds some light on whether people are actually able to do better than STL in practice. His analysis doesn't provide a specific recommendation for all situations, but it is something worth reading:

http://gdconf.com/conference/archives/2 ... e_pete.ppt

Also of interest:

http://www.tantalon.com/pete/gdc2001rou ... report.htm

Pete has a few other performance optimization articles...see his website as referenced in the links above for more. Including discussions of using custom allocators and the like.
Graham Rhodes
User avatar
bibiteinfo
Gremlin
Posts: 197
Joined: Wed Apr 12, 2006 2:48 pm
Location: Montreal, Canada

Post by bibiteinfo »

Caius wrote:
I use AMD's free CodeAnalyst profiler
yeah , that's the kind of thing i need :)
works on intel?
codeAnalyst only works on AMD, and VTune only works on intel.
Post Reply