Friday, March 14, 2008

Rescinding the CPU+GPU argument

Andrew and I have been having quite an interesting discussion through the comment section of the the last article about the future of CPU+GPU processing and I have been taking a much closer look at the topic over the last few days to develop a better understanding. What I have found out so far has led me the opposite conclusion of my previous article, that there will not be big benefits from the CPU+GPU combination in terms of processing power.

I have taken a closer look at CPU and GPU issue closer and have come to the realization that a CPU+GPU combination is likely a bad combination, unless one is designing system on a chip (SoC) processors for embedded or portable systems, though I expect the performance characteristics for these devices to be far less powerful compared to the desktop variant.

The main reasoning behind this realization is the high memory bandwidth demands most GPUs have for rendering frames. The memory used in most video cards consist of high performance RAM; of either DDR2 or the more expensive GDDR3 (5.6~54.4 GB/sec) or GDDR4 (64~156.6GB/sec) variants. The necessity for such high bandwidth is a the nature of graphics rendering where large datasets including textures, skins, masks are stored in memory and often pasted or multiplied onto a surface, not to also mention 3D geometry transformations as well. It is thus very necessary for a very parallel GPU to have fast access to the memory to perform a variety of calculations to render a 3D environment. This level of bandwidth is generally not achievable on your standard motherboard front-side-bus (FSB).

To get an idea of this problem, one simply needs to look at the amount of bandwidth the FSB provides, somewhere in the ball park of 4 GB/sec for a Hyper Transport 2 specification on an AMD based motherboard (I expect about the same performance on an Intel based chipset). The bandwidth provided by the FSB pales in comparison of the data bandwidth of a video card. This is also a core reason why integrated graphics processors on a motherboard fare so poorly when it comes to 3D rendering.

The obvious solution to this problem is to provide these integrated graphics processors access to a specialized memory controller and high speed RAM to to counter this problem, however I expect that incorporating this hardware into the motherboard will increase complexity and cost easily by $50 to $100, which may make a motherboard look incredibly expensive to a purchaser. There is also the problem of the on board graphics hardware becoming obsolete and therefore useless when the user ends up buying a plug-in video card to maintain parity with the demands of new 3D software.

There are possible workarounds for this problem as both ATI/AMD and Nvidia are working on their respective Crossfire and SLI platforms to allow paired video cards or even a video card + integrated GPU to work in conjunction to improve processing power. This technology is still in its infancy and it remains to be seen if significant improvements can be had by offering this as a possible upgrade path.

On an interesting note, I have been thumbing through the bandwidth capacities of various components in a computer while looking at this topic and I will summarize what I've noticed so far (this information is accessible from wikipedia at its list of device bandwidths page):

FSB: 4GB/sec (up to 22GB/sec, future)
PCIe 1.1: 250 MB/sec for each channel. One device may have up to 16 channels or 4GB/sec
PCIe 2.0: 8 GB/sec (16 channel), 16 GB/sec (32 channel)

RAM:
For DDR2 800, 1000MHz (dual channel): 12.8GB/sec ,16.0GB/sec
For DDR3: 21.2 GB/sec ~ 25.6 GB/sec

HD BUS:
For UDMA 133: 133MB/sec
For SATA 150, 300: 187.5 MB/sec, 375MB/sec

HD devices:
For Solid State Drives: 170~300 MB/sec read, 105 MB/sec write
For Normal Hard Drives: 44.2 ~ 1114 MB/sec (depending on RPM and location of read)

DVD Drive (16x): 21.1 MB/sec
USB 2.0: 60 MB/sec

So what does this all mean? While looking at this data, it becomes very apparent to me that the bandwidth for RAM and FSB is quite sufficient for current computational applications. I find it rather unlikely that the average user will come close to maxing out this bandwidth unless doing very intensive simulations or gaming.

However, it becomes quite apparent that the main bottleneck of most computer systems lies with the mass storage components of the computer, mainly the hard drive with read and write times 2 orders of magnitude slower compared to the faster RAM and FSB. The solid state drive is obviously much faster than your standard HD but it still lags far behind to what the rest of the computer can handle.

When considering this, it becomes easy to understand why it may take upwards of a minute to boot up your computer. Assume that your OS has 3 GB of files to read and your HD can put out an average of 50 MB/sec. This alone will take approximately 60 seconds and as a result, the hibernate feature has become an attractive feature compared to completely shutting down the computer-- most of the data is left in RAM, thus removing the need to reload everything from HD.

Thus, I do not see any major advantages to be had for GPU+CPU integrated chips on desktop systems. Even if GPU+CPU combinations are putout, it will be necessary to incorporate a dedicated memory controller memory for the GPU, which might be better off accomplished on a separate video card. The GPU+CPU combination may be useful in portable devices, though I highly doubt that the GPU in this case will be doing any high performance rendering. A second possible application for the GPU+CPU combination might be in research applications using Stream calculations as seen at the folding@home project. Researchers at Harvard have shown that GPUs are 20 to 40 times faster at processing their data sets as compared to a normal CPU. A large cluster of GPUs or GPU+CPUs in these research endeavors may be incredibly useful, though I would think the market for these computers might be small.

As of this moment, I think the best performance gains are to be had from developing much faster mass storage devices. As a result, I will most definitely be pursuing a computer with a more modest CPU and going after a dual HD system in a RAID 1 setup, offering me data redundancy and better HD read throughput.

1 comment:

Unknown said...

hehe it will be interesting to see where things go - AMD seems to be going the integrated CPU+GPU route with "Fusion" - hoping to integrate graphics functions as an x86 extension ah la x86-64. Intel seems to lean towards integrating some functionality into the CPU - some word of SSE4 has been floating about. However, Intel is mostly going with many cores as a GPU substitute.

Larrabee (x86 as a GPU) http://en.wikipedia.org/wiki/Larrabee_(GPU)

From Ars - interesting use of a many cored x86 processor (maybe realtime raytracing?): http://arstechnica.com/news.ars/post/20080117-larrabee-becomes-laterbee.html

Very recent editorial speculating about Intel and AMD roadmaps:
http://www.extremetech.com/article2/0,1697,2279786,00.asp