Epic wheel reinventing: CUDA Software Rasterizer?
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Epic wheel reinventing: CUDA Software Rasterizer?
Hi,
as the title says, it's a while that i'm thinking to that... anyone wants to use CUDA for scientific calculations, raytracing, fluids, etc...
but i never saw someone that wants to use CUDA to do what GPUs are made for
I think that attempting a wheel reinventing like this could teach more than simply learning DirectX, and maybe it could be useful in the future, when the GPUs will be more and more CPU-like (larrabee?)... also, if OpenCL really works, you could have a multi-everything render with much less efforts.
Another question is: it's possibile to gain in performance completely programming the whole pipeline, or hardware optimization always wins?
I had the feeling that newer uber-power cards, expecially in multi-GPU settings, really suffer from API and driver constraints...
Anyway, I started thinking to a possible test renderer, because i'm getting a CUDA GPU soon;
I know that all of this can as well be performance-horribile or not possibile at all, because I thought at it only reading the CUDA documentation
Anyway I was interested in understanding what do you think about this, if (apart from the actual implementatio) it's a useful or possibile approach even for distant future or if it's just FAIL
EDIT JUNE 2009:
Actually it was possible, a test version of my engine runs 70 FPS
Screenshot in the last post!
as the title says, it's a while that i'm thinking to that... anyone wants to use CUDA for scientific calculations, raytracing, fluids, etc...
but i never saw someone that wants to use CUDA to do what GPUs are made for
I think that attempting a wheel reinventing like this could teach more than simply learning DirectX, and maybe it could be useful in the future, when the GPUs will be more and more CPU-like (larrabee?)... also, if OpenCL really works, you could have a multi-everything render with much less efforts.
Another question is: it's possibile to gain in performance completely programming the whole pipeline, or hardware optimization always wins?
I had the feeling that newer uber-power cards, expecially in multi-GPU settings, really suffer from API and driver constraints...
Anyway, I started thinking to a possible test renderer, because i'm getting a CUDA GPU soon;
I know that all of this can as well be performance-horribile or not possibile at all, because I thought at it only reading the CUDA documentation
Anyway I was interested in understanding what do you think about this, if (apart from the actual implementatio) it's a useful or possibile approach even for distant future or if it's just FAIL
EDIT JUNE 2009:
Actually it was possible, a test version of my engine runs 70 FPS
Screenshot in the last post!
Last edited by _tommo_ on Tue Jun 23, 2009 5:56 pm, edited 2 times in total.
-
- Silver Sponsor
- Posts: 597
- Joined: Sun Jan 07, 2007 11:55 pm
- Location: Cologne, Germany
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
You won't be able to outperform the hardware pipeline simply because you can't use the fixed function stages of the hardware. Even though these parts don't look large in terms of transistor-count, compared to all the shader processors - I dare to say they make a huge difference for performance. The gain would be a more flexible pipeline than the hardware pipeline is. However, an efficient implementation would be a monumental programming effort.
Enough is never enough.
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
and Monumental > Epic?
In fact, it's what i feared... for example I wouldn't know how to make the rasterizer in CUDA: naively, there would be a worst case where it has to cycle all the pixels, if a triangle occupies the whole screen... and that would kill the performance.
But, many things are faster, for example you don't have batches, streaming, or occluded pixels.
And maybe there's a way to access the FF parts inside CUDA. After all, they are right there.
For me it's interesting to try...
In fact, it's what i feared... for example I wouldn't know how to make the rasterizer in CUDA: naively, there would be a worst case where it has to cycle all the pixels, if a triangle occupies the whole screen... and that would kill the performance.
But, many things are faster, for example you don't have batches, streaming, or occluded pixels.
And maybe there's a way to access the FF parts inside CUDA. After all, they are right there.
For me it's interesting to try...
-
- Google Summer of Code Student
- Posts: 1005
- Joined: Wed Jan 08, 2003 9:15 pm
- Location: Lyon, France
- x 49
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Yep but on most recent cards the fixed function pipeline is just not present and emulated by the shader processors. The PS3 RSX does so, so I think all cards newer than a 7800 are doing this way.You won't be able to outperform the hardware pipeline simply because you can't use the fixed function stages of the hardware
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Yes, but there's FF and FFCrashy wrote:Yep but on most recent cards the fixed function pipeline is just not present and emulated by the shader processors. The PS3 RSX does so, so I think all cards newer than a 7800 are doing this way.
Maybe you mean the T&L Fixed Function pipeline, the one to write materials, that has been completely removed from PS3.0 cards...
but in the GPUs remains a small Fixed Function part that is still not emulated by shaders, for example the Rasterizers and the Texture Units.
These tasks have been proven to be perfect for FF... and for example, according to Intel, in Larrabee we will still have Texture Units, because they are a main bottleneck when emulated by generic processors.
Anyway this is not a problem, because you can use Texture Units in CUDA for filtering and texturing...
-
- Silver Sponsor
- Posts: 597
- Joined: Sun Jan 07, 2007 11:55 pm
- Location: Cologne, Germany
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
The graphics pipeline consists of more than shading. There are quite a few fixed function parts left (besides texture samplers)Crashy wrote:Yep but on most recent cards the fixed function pipeline is just not present and emulated by the shader processors. The PS3 RSX does so, so I think all cards newer than a 7800 are doing this way.
Enough is never enough.
-
- Google Summer of Code Student
- Posts: 1005
- Joined: Wed Jan 08, 2003 9:15 pm
- Location: Lyon, France
- x 49
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Sorry guys, I misunderstood what jjp said
- kutraj
- Halfling
- Posts: 61
- Joined: Mon Dec 15, 2008 2:25 am
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Is there a fundamental purpose to attempt something like this?
Why not just give up and start writing a ray-tracer instead?
Why not just give up and start writing a ray-tracer instead?
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
It's that I don't think at all that raytracers are the future... just because GPUs have much more power this doesn't mean that we can waste it
Using rasterizing, I think you can get photorealistic rendering on GPUs like GTX280 or HD4870...because a modified X1950 runs GOW2. But they would be barely capable of doing a realistic realtime raytracing.
And I still didn't see a fast raytracer that can beat Crysis in graphic quality... and variety. In fact, you always get close-ups of ultra detailed scenes, amazing reflections, realistic shadows... but i don't see how it could be used in a real-world game.
Anyway the purpose would be "just to try", as i know that today this is totally useless. But, for example, there's a rumor that PS4 will use Larrabee
And if it is true, things like this should become the standard.
Using rasterizing, I think you can get photorealistic rendering on GPUs like GTX280 or HD4870...because a modified X1950 runs GOW2. But they would be barely capable of doing a realistic realtime raytracing.
And I still didn't see a fast raytracer that can beat Crysis in graphic quality... and variety. In fact, you always get close-ups of ultra detailed scenes, amazing reflections, realistic shadows... but i don't see how it could be used in a real-world game.
Anyway the purpose would be "just to try", as i know that today this is totally useless. But, for example, there's a rumor that PS4 will use Larrabee
And if it is true, things like this should become the standard.
-
- Silver Sponsor
- Posts: 597
- Joined: Sun Jan 07, 2007 11:55 pm
- Location: Cologne, Germany
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
OT: there is no modified X1950 inside the XBOX360. It is the first chip based on ATIs current architecture. You could say it is the predecessor of the Radeon HD2900_tommo_ wrote:because a modified X1950 runs GOW2.
Enough is never enough.
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Yeah you're right... anyway it's still basicly DX9, and many times less powerful than the recent GPUs.
So, for me a console with one of them, used at its full, could be near to photorealism... and in 2011-2012 those GPUs will cost much less than today.
So, for me a console with one of them, used at its full, could be near to photorealism... and in 2011-2012 those GPUs will cost much less than today.
- kutraj
- Halfling
- Posts: 61
- Joined: Mon Dec 15, 2008 2:25 am
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Hmmm... now that you mention it, have you seen Intel's famed whitepaper on the Larabee architecture?
I'm pretty skeptical about realtime ray-tracing myself - I saw an nVidia demo recently at an event in univ- they were running a 9800GT I think, and they showed their car-in-a-city demo with real-time ray-traced reflections (I'm sorry I cannot recollect the name of the demo).
It was nice, but I still did not see the 'Crysis' quality in it. I asked the guy who made the tech presentation before the demo, and he said they only were doing primary reflections (and refractions of course), and that they also had used some special spatial partitioning stuff which he didn't know about himself.
I think you can download the demo from nVidia's website if you want to try it out for yourself.
Besides, your idea seems interesting, but how would you implement it in CUDA? Run one kernel and feed it the entire octree/BSP tree structure with the vertices, indices and stuff?
Using your ideas, render to textures will no longer be 'different'
I'm pretty skeptical about realtime ray-tracing myself - I saw an nVidia demo recently at an event in univ- they were running a 9800GT I think, and they showed their car-in-a-city demo with real-time ray-traced reflections (I'm sorry I cannot recollect the name of the demo).
It was nice, but I still did not see the 'Crysis' quality in it. I asked the guy who made the tech presentation before the demo, and he said they only were doing primary reflections (and refractions of course), and that they also had used some special spatial partitioning stuff which he didn't know about himself.
I think you can download the demo from nVidia's website if you want to try it out for yourself.
Besides, your idea seems interesting, but how would you implement it in CUDA? Run one kernel and feed it the entire octree/BSP tree structure with the vertices, indices and stuff?
Using your ideas, render to textures will no longer be 'different'
- volca
- Gnome
- Posts: 393
- Joined: Thu Dec 08, 2005 9:57 pm
- x 1
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
OT: I saw a video if this... Nice but that's all. I personally think rasterisation does not cut it any better than raytracing. It was chosen in the past times because it's simpler to implement and is generally faster, but I see a great effort to push the realism these days and rasterisation will crash it's head on the ceiling some day.kutraj wrote:I'm pretty skeptical about realtime ray-tracing myself - I saw an nVidia demo recently at an event in univ- they were running a 9800GT I think, and they showed their car-in-a-city demo with real-time ray-traced reflections (I'm sorry I cannot recollect the name of the demo).
It was nice, but I still did not see the 'Crysis' quality in it. I asked the guy who made the tech presentation before the demo, and he said they only were doing primary reflections (and refractions of course), and that they also had used some special spatial partitioning stuff which he didn't know about himself.
Ray tracing has it's flaws as well though - it can perform rather well today but only on a mostly static scene where nothing or nearly nothing moves, so there is a little to no cost in recalculation of the hierarchy description structures. Maybe triangles are to blame here - after all there could be dynamic structures that could be easily traversed (sparse voxels?).
I think the rasterisation emulation could be nice idea. I'd be interested if this could mean you'd be able to do (for example) 360 degree projection, etc. - the thing normally only doable by cube maps. I don't know much about CUDA though to judge if this kind of thing would be possible...
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
And that single car + environment pushed a 9800 GT to the limit... so IMHO the cost/quality ratio is really far of being convenient.kutraj wrote:It was nice, but I still did not see the 'Crysis' quality in it. I asked the guy who made the tech presentation before the demo, and he said they only were doing primary reflections (and refractions of course), and that they also had used some special spatial partitioning stuff which he didn't know about himself.
I think you can download the demo from nVidia's website if you want to try it out for yourself.
Besides, your idea seems interesting, but how would you implement it in CUDA? Run one kernel and feed it the entire octree/BSP tree structure with the vertices, indices and stuff?
Using your ideas, render to textures will no longer be 'different'
Anyway i don't know exactly how it could be implemented, there are sure many ways, and one of the advantages of this approach is that you can choose the best one for your individual case...
The pipeline i thought in the first post, for example, should be good for opaque and single-material objects + small levels, because you can feed the data a single time and then just keep updating the same buffer, passing different parameters... so that you can execute all the "vertex shaders" with a single draw call.
You could make also very different pipelines, that render in completely different way... for example you could have simple raytraced reflections, or procedural surfaces, ecc...
I think you could also do 360 degree rendering, in fact you set up the rasterizer, and you decide wich pixel goes where
So they could projected in any way, i think.
Maybe a thing like this would make the things easier in the end, as everything becomes more high-level and standardized... as you have not to care of the countless caveats and work arounds needed for a complete DX or OGL renderer.
For example, you can post-process anything at any time, and good bye screen quads and render windows
- volca
- Gnome
- Posts: 393
- Joined: Thu Dec 08, 2005 9:57 pm
- x 1
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
I think this is the way graphic programming APIs will go anyway (especially if larrabee will be successful). It seems the APIs will be more flexible than they are today, with the code on the GPU being the most important thing. Somewhere I read that it will even be possible to do visibility evaluation/scene graph operations on GPU.
So trying to do this now using CUDA will not be so different from the thing you'll have to do a few years from now, I suppose
So trying to do this now using CUDA will not be so different from the thing you'll have to do a few years from now, I suppose
- kutraj
- Halfling
- Posts: 61
- Joined: Mon Dec 15, 2008 2:25 am
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Er... give up and use OGRE instead?_tommo_ wrote: Maybe a thing like this would make the things easier in the end, as everything becomes more high-level and standardized... as you have not to care of the countless caveats and work arounds needed for a complete DX or OGL renderer.
But yeah, the 'unified' rendering scheme could easily circumvent a lot of issues that we might face in today's APIs. I'm kinda really interested - are you planning to give this a shot anytime soon?
I'm working on GPU accelerated scene-graph type operations. I'm trying to work out a scheme for virtual EM imaging - I'm using CUDA of course. Basically, the boundaries are blurring with CUDA - you can do almost anything you want as long as you take the pains (serious pains, trust me...) of managing memory correctly - and ultimately end up resorting to other 'hacks' to get stuff to work.volca wrote:I think this is the way graphic programming APIs will go anyway (especially if larrabee will be successful). It seems the APIs will be more flexible than they are today, with the code on the GPU being the most important thing. Somewhere I read that it will even be possible to do visibility evaluation/scene graph operations on GPU.
So trying to do this now using CUDA will not be so different from the thing you'll have to do a few years from now, I suppose
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Don't worry, i'm not betraying Ogre
In fact, i'm sure that a thing like that would never be as real-life-complete as Ogre, so to make a real game it's always better to use it.
Anyway i want to try it soon, when i will have some brain-time (exams don't leave much ) and a CUDA GPU. The fact is, I need dx10 only to try this engine, and i feel like it's a big cost for one single thing.
In fact, i'm sure that a thing like that would never be as real-life-complete as Ogre, so to make a real game it's always better to use it.
Anyway i want to try it soon, when i will have some brain-time (exams don't leave much ) and a CUDA GPU. The fact is, I need dx10 only to try this engine, and i feel like it's a big cost for one single thing.
- volca
- Gnome
- Posts: 393
- Joined: Thu Dec 08, 2005 9:57 pm
- x 1
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
I'm not terribly informed about the state of GPU computing but it seems OpenCL enabled drivers should be available soon - maybe this would help you (getting rid of the DX10 dependency)?
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
OpenCL unfortunately requires MORE than Dx10 - it will run on any CUDA card, but only on HD4 series by AMD, wich is 2 generations after dx10... but still i think it has a great potential, also because everyone accepted it as standard.
- kutraj
- Halfling
- Posts: 61
- Joined: Mon Dec 15, 2008 2:25 am
Re: Epic wheel reinventing: CUDA Software Rasterizer?
@ _tommo_
You mean a DX10 capable card right? Or a DX10 dependency? I'd go with OpenGL, with only the framebuffer to write to.
I've got a (pretty old) 8600 on which I'm running my tests. I've written a pseudo ray-tracer already and have it functioning. Unfortunately, I cannot afford to buy one of those shiny Teslas or even the GTX280 yet... but my time will come...
The OpenCL specs were released late last year. IIRC nVidia has said that they'll have beta drivers sometime later this year. We could expect ATI/ AMD to totally embrace it though, given that they're kinda lagging in the GPU computing division. But I guess it might still take a while for everything to be stable. I still find bugs with CUDA which nVidia are trying to iron out, and what its been a couple of years already?
And what of DX11's compute shaders? I haven't read the specs, but someone was mentioning them to me some time ago and it just popped up in this context.
You mean a DX10 capable card right? Or a DX10 dependency? I'd go with OpenGL, with only the framebuffer to write to.
I've got a (pretty old) 8600 on which I'm running my tests. I've written a pseudo ray-tracer already and have it functioning. Unfortunately, I cannot afford to buy one of those shiny Teslas or even the GTX280 yet... but my time will come...
The OpenCL specs were released late last year. IIRC nVidia has said that they'll have beta drivers sometime later this year. We could expect ATI/ AMD to totally embrace it though, given that they're kinda lagging in the GPU computing division. But I guess it might still take a while for everything to be stable. I still find bugs with CUDA which nVidia are trying to iron out, and what its been a couple of years already?
And what of DX11's compute shaders? I haven't read the specs, but someone was mentioning them to me some time ago and it just popped up in this context.
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Yeah i meant dx10 capable, because i'm on Windows and i'm linking the two things
Maybe i will find the way to exchange my X1950 with a crappy 8600 for free... even if i will get worst game performance.
A good thing about OpenCL is in fact that is strongly supported outside Microsoft, and could make its way even on next-gen (non MS i think) consoles... anyway i think it's too early to make assumptions on its future support, because the first stable version isn't even announced... and as you say it will be buggy and won't support all the specs for a long while, probably.
And on the other side you have DX, which doesn't want to lose its monopoly... in fact DX11 looks like a try to absorb GPGPU target into standard DX developing, also if i'm not very convinced on the effectiveness of an hybrid API...
Maybe i will find the way to exchange my X1950 with a crappy 8600 for free... even if i will get worst game performance.
A good thing about OpenCL is in fact that is strongly supported outside Microsoft, and could make its way even on next-gen (non MS i think) consoles... anyway i think it's too early to make assumptions on its future support, because the first stable version isn't even announced... and as you say it will be buggy and won't support all the specs for a long while, probably.
And on the other side you have DX, which doesn't want to lose its monopoly... in fact DX11 looks like a try to absorb GPGPU target into standard DX developing, also if i'm not very convinced on the effectiveness of an hybrid API...
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
I finally got a lousy 8600 GT, so I started experimenting
For today i managed to create a simple blur filter on a texture loaded from file... anyway there are some design issues that bother me:
-float2, 3, 4 built-in file types have not built-in maths, so i have to sum each single component... and this kills the SIMD design i think (and causes code bloat).
it's really strange also because they exist in shaders, and one would think that CUDA is more powerful.
-lacks documentation; there are limits everywhere, and they are nowhere explained. Why it does nothing if i go out of bounds? Why it goes out of bounds in the first place? How many block and threads can i create? Which functions are slower?
i'm still searching a good documentation-guide other than the SDK ones
For today i managed to create a simple blur filter on a texture loaded from file... anyway there are some design issues that bother me:
-float2, 3, 4 built-in file types have not built-in maths, so i have to sum each single component... and this kills the SIMD design i think (and causes code bloat).
it's really strange also because they exist in shaders, and one would think that CUDA is more powerful.
-lacks documentation; there are limits everywhere, and they are nowhere explained. Why it does nothing if i go out of bounds? Why it goes out of bounds in the first place? How many block and threads can i create? Which functions are slower?
i'm still searching a good documentation-guide other than the SDK ones
- _tommo_
- Gnoll
- Posts: 677
- Joined: Tue Sep 19, 2006 6:09 pm
- x 5
- Contact:
Re: Epic wheel reinventing: CUDA Software Rasterizer?
At last the experimenting brought me somewhere...
here's the eyecandy (or at least something not too ugly )
The scene is renderized wireframe at 32 FPS on my 8600GT; it contains 345.000 polygons, 2 directional lights (one blue and one orange) with diffuse and specular and some pretty rim lighting
The lights are fully deferred, as the lighting is applied on the screen space normal buffer... also it is natively HDR because all the buffers are just floats.
Also, the whole "light kernel" looks faster than a similar thing made in normal shaders.
I'm not too positive about the performance, because it still lacks polygons, textures and interpolation... but at least it works
As jjp prophesized months ago, the main bottleneck is the lack of FF: rasterization is the most heavy part of my pipeline, sucking up 60% or more of frame time... cleary it would make good use of ROPs.
Maybe i could release a small demo, but i'm afraid that it would destroy other PCs
EDIT: i noticed i wasn't aligning buffers, now it reaches 70 FPS
here's the eyecandy (or at least something not too ugly )
The scene is renderized wireframe at 32 FPS on my 8600GT; it contains 345.000 polygons, 2 directional lights (one blue and one orange) with diffuse and specular and some pretty rim lighting
The lights are fully deferred, as the lighting is applied on the screen space normal buffer... also it is natively HDR because all the buffers are just floats.
Also, the whole "light kernel" looks faster than a similar thing made in normal shaders.
I'm not too positive about the performance, because it still lacks polygons, textures and interpolation... but at least it works
As jjp prophesized months ago, the main bottleneck is the lack of FF: rasterization is the most heavy part of my pipeline, sucking up 60% or more of frame time... cleary it would make good use of ROPs.
Maybe i could release a small demo, but i'm afraid that it would destroy other PCs
EDIT: i noticed i wasn't aligning buffers, now it reaches 70 FPS
- :wumpus:
- OGRE Retired Team Member
- Posts: 3067
- Joined: Tue Feb 10, 2004 12:53 pm
- Location: The Netherlands
- x 1
Re: Epic wheel reinventing: CUDA Software Rasterizer?
Indeed, the only significant fixed function parts that are not exposed (yet) are the triangle rasterizer, and raster operations like alpha blending, depth write/compare, etc. Software renderers will certainly come back._tommo_ wrote: Anyway this is not a problem, because you can use Texture Units in CUDA for filtering and texturing...
- volca
- Gnome
- Posts: 393
- Joined: Thu Dec 08, 2005 9:57 pm
- x 1
- Contact: