RSX Reality Synthesizer
The RSX 'Reality Synthesizer' is a proprietary graphics processing unit codeveloped by Nvidia and Sony for the PlayStation 3 game console. It is a GPU based on the Nvidia 7800GTX graphics processor and, according to Nvidia, is a G70/G71 hybrid architecture with some modifications. The RSX has separate vertex and pixel shader pipelines. The GPU makes use of 256 MB GDDR3 RAM clocked at 650 MHz with an effective transmission rate of 1.3 GHz and up to 224 MB of the 3.2 GHz XDR main memory via the CPU.
Although it carries the majority of the graphics processing, the Cell Broadband Engine, the console's CPU, is also used complementarily for some graphics-related computational loads of the console.
Specifications
Unless otherwise noted, the following specifications are based on a press release by Sony at the E3 2005 conference, slides from the same conference, and slides from a Sony presentation at the 2006 Game Developer's Conference.- 500 MHz on 90 nm process, 300+ million transistors
- Based on NV47
- Little Endian
- Multi-way programmable parallel floating-point shader pipelines, independent pixel/vertex shader architecture
- * 24 parallel pixel-shader ALU pipelines clocked at 550 MHz
- ** 5 ALU operations per pipeline, per cycle
- ** 16 floating-point operations per pipeline, per cycle
- ** Floating Point Operations per second : 211.2 GFLOPS
- * 8 parallel vertex pipelines
- ** 2 ALU operations per pipeline, per cycle
- ** 10 FLOPS per pipeline, per cycle
- ** Floating Point Operations per second : 40 GFLOPS
- * Total Floating Point Operations per second : 251.2 GFLOPS
- * 24 texture filtering units and 8 vertex texture addressing units
- ** 24 filtered samples per clock
- ** Maximum Texel fillrate: 12.0 Gigatexels per second
- ** 32 unfiltered texture samples per clock
- * 8 render output units / pixel rendering pipelines
- ** Peak pixel fillrate : 4.0 Gigapixel per second
- ** Maximum Z-buffering sample rate: 8.0 Gigasamples per second
- * Maximum dot product operations: 51 billion per second
- * 128-bit pixel precision offers High Dynamic Range rendering
- 256 MB GDDR3 RAM at 650 MHz
- * 128-bit memory bus width
- * 20.8 GB/s read and write bandwidth
- Cell FlexIO bus interface
- * Rambus XDR Memory interface bus width: 56bit out of 64bit
- * 20 GB/s read to the Cell and XDR memory
- * 15 GB/s write to the Cell and XDR memory
- 576 KB texture cache
- Support for PSGL
- Support for S3 Texture Compression
Model numbers
90nm:- CXD2971AGB
- CXD2971DGB
- CXD2971GB
- CXD2971-1GB
- CXD297BGB
- CXD2982
- CXD2982GB
- CXD2991GB
- CXD2991BGB
- CXD2991GGB
- CXD2991CGB
- CXD2991EGB
- CXD5300AGB
- CXD5300A1GB
- CXD5301DGB
- CXD5302DGB
- CXD5302A1GB
Local GDDR3 physical memory structure
- Total Memory 256MB
- 2 Partitions
- 64bit bus per partition
- 8 Banks per partition
- 4096 Pages per bank -> 12bit Row Address
- Memory block in a page -> 9bit Column Address
- Minimum access granularity = 8 bytes -> same as buswidth between RSX <> GDDR
RSX memory map
Address Range | Size | Comment |
0000000-FBFFFFF | 252 MB | Framebuffer |
FC00000-FFFFFFF | 4 MB | GPU Data |
FF80000-FFFFFFF | 512KB | RAMIN: Instance Memory |
FF90000-FF93FFF | 16KB | RAMHT: Hash Table |
FFA0000-FFA0FFF | 4KB | RAMFC: FIFO Context |
FFC0000-FFCFFFF | 64KB | DMA Objects |
FFD0000-FFDFFFF | 64KB | Graphic Objects |
FFE0000-FFFFFFF | 128KB | GRAPH: Graphic Context |
Besides local GDDR3 memory, main XDR memory can be accessed by RSX too, which is limited to either:
- 0MB - 256MB
- 0MB - 512MB
Speed, bandwidth and latency
- Cell to/from 256MB XDR : 25.6 GB/s
- Cell to RSX : 20GB/s
- Cell from RSX : 15GB/s
- RSX to/from 256MB GDDR3 : 20.8GB/s
Speed table
Because of the very slow Cell Read speed from the 256MB GDDR3 memory, it is more efficient for the Cell to work in XDR and then have the RSX pull data from XDR and write to GDDR3 for output to the HDMI display. This is why extra texture lookup instructions were included in the RSX to allow loading data from XDR memory.RSX libraries
The RSX is dedicated to 3D graphics, and developers are able to use different API libraries to access its features. The easiest way is to use high level PSGL, which is basicially OpenGL|ES with programmable pipeline added in, however this is unpopular due to the performance overhead on a relatively weak console CPU.At a lower level developers can use LibGCM, which is an API that builds RSX command buffers at a lower level.. This is done by setting up commands and DMA Objects and issuing them to the RSX via DMA calls.
Differences with the G70 architecture
The RSX 'Reality Synthesizer' is based on the G70 architecture, but features a few changes to the core. The biggest difference between the two chips is the way the memory bandwidth works. The G70 only supports rendering to local memory, while the RSX is able to render to both system and local memory. Since rendering from system memory has a much higher latency compared to rendering from local memory, the chip's architecture had to be modified to avoid a performance penalty. This was achieved by enlarging the chip size to accommodate larger buffers and caches in order to keep the graphics pipeline full. The result was that the RSX only has 60% of the local memory bandwidth of the G70, making it necessary for developers to use the system memory in order to achieve performance targets.Difference | RSX | nVidia 7800GTX |
GDDR3 Memory bus | 128bit | 256bit |
ROPs | 8 | 16 |
Post Transform and Lighting Cache | 63 max vertices | 45 max vertices |
Total Texture Cache Per Quad of Pixel Pipes | 96kB | 48kB |
CPU interface | FlexIO | PCI-Express 16x |
Technology | 40 nm/65 nm/90 nm | 110 nm |
Other RSX features/differences include:
- More shader instructions
- Extra texture lookup logic
- Fast vector normalize
Press releases
Nvidia CEO Jen-Hsun Huang stated during Sony's pre-show press conference at E3 2005 that the RSX is twice as powerful as the GeForce 6800 Ultra.