Tuesday, March 04, 2008

The Future of Computing: Integrated GPU and CPUs

For those that might not be as well versed in the realm of computer lingo, a general processing unit (otherwise known as a GPU) is the heart most video processing devices. Its difference from a CPU is its ability to run several processes in parallel compared to a CPU that runs a single process at a time but very quickly.

The GPU differs from the CPU as a result of application, where the CPU is designed to run single program quickly versus a GPU that is used at performing many calculations simultaneously to render graphics on a computer screen. What we will be seeing in the future is a merging of these two chips into a single processing unit and the company that is probably the best suited for this is likely AMD should they work out the kinks in their merger with ATi.

Several years ago, the computer industry was obsessed with improving computer speed by ramping speeding up the CPU clock cycle, or the number of calculations a single set of computer circuitry can handle per second. Unfortunately, further advancements in this area provides diminishing returns as CPUs require more power the faster they operate, which is something undesirable considering that many CPUs are now being used in battery operated Laptops. In addition, it is also becoming increasingly challenging to continually push the clock speed faster. So what the computer industry has done instead is go through a simpler route; the integration of several processing units into a single silicon package to reap further performance gains. This is possible due to patterning technologies allowing makers such as AMD and Intel make smaller and smaller CPUs making it cheaper to design chips with several integrated processors on a single chip, resulting in multi-core CPUs being currently sold on the market. I believe that this trend will change in the coming years.

A new revolution in computing devices is occurring with a push to smaller and more portable computing systems. My previous post covered touched lightly on the growing trend of ultra-portable computers-- expect to see more of them in the future. With computers moving in this direction, there will be a need for tighter integration of computer parts to achieve further miniaturization. I have covered this before in a previous post but will reiterate some of the important points here.

Components for both desktops and laptop computers are made of discretized parts. In the desktop case, the computer parts consist of modular cards that can be swapped into a mainboard. If you want a new video card, simply buy a newer version and swap the parts. Laptops, on the other hand, do away with the modular design to focus on integrating the video, sound and networking chips onto a single mainboard for a compact design. This miniaturization will continue, but this time will occur on the silicon chip level as opposed to the circuit board level and I believe that simply adding more processor cores to a silicon die is the wrong solution.

I predict that the next leap in processing technology will be the integration of a CPU and a GPU core onto a single die. I believe that this will be a very big step in the development of smaller ultra-portable hand-held computing devices.

The CPU and the GPU are the most computationally intensive chips in a computer system, quite often with the CPU continually communicating with the GPU to render graphics on to a computer monitor. A communication lag exists between the CPU and the GPU as a result of current computer design architecture. The following graph illustrates this issue:


The northbride/southbridge configuration of a computer (source: Wikipedia)

Computer chips and peripheral devices are connected using memory controller chips, otherwise known as the northbridge and southbridge chips. Data between devices flows between data lines, otherwise known as a "bus." Video cards, as a result of the amount of data they continually process are generally given a fast bus to allow the video card to transfer data quickly between the CPU or on-board RAM (which is not often as most video cards come integrated with their own RAM). Unfortunately, the data transfer limitation of the bus between the CPU and GPU is a bottleneck, especially for high resolution 3D rendering as sufficient data cannot reach the GPU quickly enough, sometimes resulting in choppy video at higher resolutions. The obvious solution is the tighter integration of the CPU and the GPU together resulting in superior data transfer rates and processing capabilities. The combination of these chips would also remove the need for a northbidge chip, thus simplifying the computer architecture further.

There are also other advantages to having a GPU core integrated with a CPU. Primarily for the reason that a GPU is a parallel processing unit, though somewhat slower than a CPU. Most operating systems operate using several concurrent processes running at the "same time." In reality, this is because the CPU operates very quickly; it rotates between running these processes giving the illusion that it does these things at the same time. Unfortunately, this is not an ideal solution because the computer must either save it's memory of what it was doing and for one process while it switches to another process. This memory swapping business is what often slows down computers running many concurrent programs. However, this is fine for processes with some little time lag; so that it can run off and do something else while it waits for data or commands to arrive.

Unfortunately, in other cases, this might not be possible when multiple processes require continual CPU attention and since a CPU has the equivalent of a one track mind, it becomes advantageous to have other computing components to deal with these continual demands. This is originally why discrete chips were often used for specialized functions including video, sound and networking. Enter the GPU, the silver bullet.

Because of the parallel design of the GPU, integration of a chip like this in addition to the CPU will allow for significant performance benefits, allowing a computer to handle more processes simultaneously while having a main CPU for computationally intensive processes. The GPU in it's current state could replace the sound chips with all processing done directly by the GPU. This is advantageous since sound and video often go hand-in-hand in video applications.

This design mirrors the architecture of the human brain and body as it should be no surprise that our sense of sight, hearing and taste are kept close to the brain reducing the lag time in processing and responding to important sensory data quickly. The sense of touch is somewhat different since it encompasses the whole body, but it is interesting to note that the body has a spare processor in the spine for reflex responses from sudden dangerous touch responses.

The future of computing will go in this direction and the integration of multi-core CPUs are a crude step in this direction-- expect more eloquent GPU-CPU chips to come out in the next few years with solutions in the works from both Intel and AMD.

I believe AMD is in the best position to accomplish this after acquiring ATi and it's portfolio of design expertise in developing video card technology. Unfortunately, the merger is not going very well at the moment, causing AMD to stumble. AMD's current offering of CPUs are somewhat inferior compared to Intel's offering in terms of maximum computational power, but AMD offers a better price/performance compared to Intel chips. What I am certain of is that Intel has limited design experience with their motherboard integrated video chips as they provide the poorest performance when compared against ATi and Nvidia's solutions. As Nvidia currently does not have the same close working relationship with Intel as ATi has with AMD, I doubt that and Intel-Nvidia CPU-GPU combination will be produced in the near future unless AMD-ATi dominates the market with their offerings.

The target for AMD's new CPU-GPU chips will most likely be for the growing mobile computing market segment and I believe this to be the wisest move. The next few years in computing technology will be quite interesting.

6 comments:

Unknown said...

very good article indeed

Paladiamors said...

Thank you very much!

Unknown said...

Actually, I think the opposite - in a sense. I think that multi-core solutions are where things will end up in the end. CPUs already incorporate many of the technologies that were in early GPUs and GPUs are moving towards becoming more generalized processors. SIMD (Single Instruction, Multiple Data) technologies (branded as MMX, SSE,etc on CPUs) are effectively what GPUs do. Only that GPUs dedicate the whole silicon to to the work.

GPUs have introduced programability with their pixel shaders. Basically a language that programmers can use to create custom, special effects. This also introduced initiatives from Nvidia and ATi(AMD I guess) to leverage this programability for scientific computing (an area where the SIMD focus of GPUs is of great benefit). I believe both companies have APIs for this.

There already exist many SoC (System on a chip) solutions (There are many others, but the AMD Geode and the Pentium M 745 come to mind), where everything (Video, Processing, Networking, etc) exists on a single piece of silicon. These are often used in embedded or small devices. While these have space, energy and cost benefits, they do not have the performance of any of the current mainstream solutions. Creating such an integrated chip, while possible, would probably be too cost prohibitive due to die size and transistor count for the next couple of years.

System bus bandwidth for current system architecture is sufficient (or equally insufficient to CPU+GPU) and point to point protocols like hyper transport reduce the need for an integrated solution - latency could be considered an issue, but bandwidth wouldn't be. Additionally, memory for the GPU would have to be shared with the CPU in a GPU+CPU setup, creating additional memory bandwidth issues - this is one reason the performance with integrated graphics is so poor.

Intel's Many core research: (http://techresearch.intel.com/articles/Tera-Scale/1449.htm) As long as the arbiter managing the cores is efficient ( that is a BIG if), and there is enough memory bandwidth (another BIG if), I could see this sort of solution easily rival any integrated CPU+GPU solution. Need more raw integer crunching over floating point 3D? the system can dedicate more of the die's cores to the task. I'm not saying this is a trivial undertaking, but the benefits are there.

I can see CPU+GPU as a stop gap, but I think "many, many core systems" are the overall end goal.

The quick version - many general purpose cores is more flexible/ better than mixing special purpose and general purpose cores on a single die. It is also where I think things are going to head down eventually. Integrated CPU+GPU currently provides benefits with power consumption, and system size, but does not improve performance.

Paladiamors said...

I've taken a look at the link you provided and it seems like that the paper presented represents 80 CPU cores in sort of a mesh network. A video of the thing in operation is located on youtube here: http://youtube.com/watch?v=DkFlwKSzHms

I would have to say that using simply throwing CPU cores at a problem like this isn't a very hardware effective, but this of course depends on the application of these chips.

What I have read so far (and correct me if I am wrong) is that GPUs tend to be far more parallel than CPUs simply because they are designed for parallel processing for matrix type calculations. This level of parallelism is obviously important in graphical processing with the datasets being incredibly large. A CPU is obviously not well suited for this task.

I will agree that technologies such as MMX and SSE do offer some parallelism for processing but not to the extent that a GPU is capable of achieving. A GPU will have smaller processing units running in parallel to churn out matrix calculations, where as a CPU will have fewer cores, but run on a faster clock cycle and have access to more memory to do calculations in serial. A few extra pipelines and processing units are there of course to help out branch prediction and stuff, but these are there for different reasons compared to a GPU.

GPUs and CPUs are designed for very different tasks and the data throughput is very design dependent. IBM's deep blue computer used in chess competitions was based on a very parallel computing system used to simulate many different branches of a chess game to simultaneously to come up to a solution. A traditional CPU will have trouble keeping up with such a parallel problem.

I think the CPU vs GPU debate is more of a design trade off to find the most effective ways of using on die transistors. If it were the case that a computer was made out of CPUs only, where by I mean that HDD IO, Video, networking and the work spread across several CPUs might be a little overkill. It might be more effective to use specialized hardware.

I believe the argument would be "why bother use a north/south bridge" configuration instead when we could plop in a GPU or a CPU to handle this instead? I think we can agree here that specialized hardware is better suited for these tasks.

I do agree that with further feature shrinks, it might be very tempting to say that we can just replace a GPU with a bunch of CPUs instead, but depending on the problem, the silicon space might be better off used with a GPU depending on the application space. Effective use of the transistors, I think, is very important, because as long a we can build more powerful processors, we will continue to find more demanding applications for them.

Paladiamors said...

Here is a case where more GPUs might be more advantageous compared to more CPUs:

http://www.youtube.com/watch?v=SMxfzil5_qU&NR=1

Unknown said...

I agree that it is a design trade off between the highly focussed design of a GPU and the versatility of a CPU.

I took a look at the blender video, I'd actually say it is probably a case for having a using CPUs over GPUs. I'd guess that this was actually rendered using Ray tracing instead of rasterization (given the amount of CPU power required - one of the comments also mentioned it). GPUs currently cannot assist is those sorts of calculations. In this case, adding more cores gives you a tangible benefit in ray tracing performance.

Something I read recently regarding ray tracing:
http://www.intel.com/technology/itj/2005/volume09issue02/art01_ray_tracing/vol09_art01.pdf

I think it's hard to say that we can just stuff a GPU together with a CPU. I think it's more "what commonly done calculations can be shifted over to the CPU? (The philosophy behind MMX and SSE. PowerPCs had it better with AltiVec, but only IBM really uses them now)" The local memory contention between a fused GPU and CPU would cause severe bandwidth constraints in any current architecture unless the GPU functions had their own path to dedicated memory, or the CPU+GPU combination had a cache size of hundreds of megabytes for texture info. .

At this point the transistor count would be through the roof and no longer financially viable in regards to chips per wafer, and wafer waste due to defects.

Another example of why we should integrate a GPU with a CPU is with integrated video - not only is the video performance poor, CPU performance is dragged down because of the reduced memory bandwidth. Just popping in a cheap $40 vid card will improve overall system performance in many cases

On top of this, you lose any sort of upgrade path and are tied in to a single companies CPU+GPU products.

You also mentioned Deep blue - I think that multi-cored CPU would be a great way to re-implement this chess machine. Mutil core systems, while designed for "Multiple Instuctions Multiple Data" are perfectly capable for parallel tasks as well. Additionally, I think that the AI in deep blue is not an AI at all - it was just very fast in calculating all the possible moves. I think that a modern system using the same algorithm as Deep blue would be able to beat the original, just because it was faster.

I can see integration of discrete components as a cost saving measure in lower performance systems but not as a way to improve performance overall.

I'm reminded of that picture of a triangle with Cheap, Fast and Good on each of the vertices.
"Cheap, Fast, Good. Pick two." The other one being: "Small, Fast, Cheap. Pick two."

I think that integrating formerly discrete PC components onto a single die (IE: system on a chip) is "Good and cheap" or can be "Good and Fast." SoC systems seem to sit out at the extremes with no real middle ground.

I'm haven't worked out where our current computer architecture fits in this yet. but if seems to sit inside the extremes of integrated SoCs. giving a better compromise between cost and performance (badly losing out on power consumption and size however)

It's a bit late and I'm about to fall asleep. let me know if I wasn't clear. :)