Yet Another Gamecube Documentation

2  Gamecube Hardware Introduction


The GameCube is a powerful piece of hardware. The whole system is based on the IBM PowerPC Gekko processor and the custom ATI Flipper video system. The PowerPC Gekko processor is really just a PowerPC 750 with a few enhancements.
index

2.1  enhanced PowerPC 750 Specification


The enhanced PowerPC Gekko processor also contains many features for minimization of processor delays because of data accessing and for maximization of processing throughput: The instruction set of the PowerPC Gekko processor seems to be almost identical to the one of the PowerPC 750 processor.The only visible differences at the moment are that the PowerPC Gekko processor has a few AltiVecSIMD opcodes added to its final instruction set.
index

2.2  Consumer Units


index

2.2.1  Nintendo


2.2.1.1   HW1

HW1 was an initial, buggy version of the GameCube hardware that wasnt sold at retail.
2.2.1.2   HW2

HW2 is the first hardware that was sold in stores to the public. 2.2.1.3   HW2 'second edition'   The second edition models are missing the "Serial Port 2" that the first edition had. The plastic cover is still on the bottom of the cube, where the port used to be, but there's just a metal plate underneath it, and no connector. 2.2.1.4   HW2 'third edition'   The third edition Gamecubes are missing both the "Serial Port 2", and the Digital A/V connector.
index

2.2.2  Panasonic Q


There is a Gamecube combined with dvd-player manufactured by Panasonic called 'Panasonic-Q'. It seems to be exactly the same as HW2 for the Gamecube part, except that the dvd drive is different.
index

2.3  Development Units


Nintendo provides development hardware units to official, licensed GameCube developers. There are namely two different versions: the GDEV and the DDH hardware development kit units. These units are the same as retail GameCube HW2 units with some changes: They have PC communications features (either through SCSI or USB) and they have DVD emulation hardware instead of a proprietary mini-DVD drive. GameCube development units also seem to have slower processor speeds than retail GameCubes, this clock speed ranges from around 150MHz to 400MHz. Development GameCubes also seem to contain more RAM than retail ones, namely around 40MB. SNSystems also provides their own development kit,authorized by Nintendo, called the TDEV. According to specifications directly from SNSystems, the TDEV development hardware contains twice as much memory as retail GameCubes for debugging and a direct PC<->TDEV USB connection for fast uploading of code andor data. Finally, there is also another proprietary development kit called the NR-Reader. NR-Reader's contain less debugging capabilities than the other development kits and are mostly meant for developers to efficiently get their demosgames to beta testers or media. However, SNSystems reports that their ProDG development kit can be used with a special USB adapter of theirs for directly sending program (debug) code to NR-Reader GameCubes. Also, NR-Reader GameCubes contain different mini-DVD drives than retail GameCubes, but still use a proprietary writingreading format which is currently unknown. The DVD drives of NR-Readers can only read special DVDs that can only be written correctly with NR-Writer hardware (which is really just a PanasonicMatshita SW-9501 with modified firmware). Also, the official debug development kits possibly contain J-TAG support, which is a method for debugging hardware. If so, there is a possibility that J-TAG support still remains in retail GameCubes as well, but this is purely hypothesis. If, in fact, retail GameCubes contain J-TAG debugging support then it should be possible for (homebrew) code to be uploaded through a J-TAG cable) directly to a GameCube's RAM and executed.
index

2.4  Hardware Parts List


index

2.4.1  Connectors


The are 10 different connectors on the GameCube's mother-board. The following table contains an ID key and a short functional description.

ID Description
P1 Motherboard Power Connector - MBB - Top Left
P2 Digital Video Output Connector - MBU - Bottom Left
P3 Controller Pad Board Connector - MBU - Middle Right
P4 Memory Card Slot Connector A - MBU - Top Right
P5 Memory Card Slot Connector B - MBU - Bottom Right
P6 Serial Port Connector 1 - MBB - Top Right
P7 Analog Video Output Connector - MBU - Middle Left
P8 Serial Port Connector 2 - MBB - Top Right
P9 Mini-DVD Drive Port Connector - MBU - Top Right
P10 Hi-Speed Parallel Port Connector - MBB - Bottom Left

2.4.1.1   Memory Card Slots (P4,P5)  
pin Signal
1 EXTIN
2 GND
3 INT
4 3.3V
5 DO
6 5V
7 DI
8 3.3V
9 CS
10 Ground (Shield)
11 CLK
12 EXTOUT

2.4.1.2   High-speed Port (P8)  
pin Signal
1 3.3V
2 GND
3 INT
4 CLK
5 DO
6 DI
7 CS
8 Ground (Shield)

2.4.1.3   SDRAM/Parallel Port (P10)  
pin Signal
1 VCC
2 Ground
3 DQ0
4 DQ7
5 DQ1
6 DQ6
7 DQ2
8 DQ5
9 DQ3
10 DQ4
11 VCC
12 Ground
13 write enable
14 DQM
15 CAS
16 Clock
17 RAS
18 A12
19 CS (Chip Select)
20 A11
21 BA0
22 A9
23 BA1
24 A8
25 A10
26 A7
27 A0
28 A6
29 A1
30 A5
31 A2
32 A4
33 A3
34 INT
35 VCC
36 Ground

2.4.1.4   BBA/Modem Connector (P6)  
pin Signal
1 EXTIN
2 Ground (Shield)
3 INT
4 CLK
5 12V
6 DO
7 3.3V
8 3.3V
9 DI
10 CS
11 Ground
12 Ground

2.4.1.5   DVD Interface Connector (P9)

pin Signal
1 AISLR (audio bus)
2 5V
3 AISD (audio bus)
4 5V
5 AISCLK (audio bus)
6 5V
7 DIHSTRB
8 5V
9 DIERRB
10 Ground
11 DIBRK
12 DICOVER
13 DIDSTBR
14 DIRSTB
15 DIDIR
16 Ground
17 DID7
18 Ground
19 DID6
20 Ground
21 DID5
22 Ground
23 DID4
24 Ground
25 DID3
26 Ground
27 DID2
28 MONI
29 DID1
30 MONOUT
31 DID0
32 Ground

2.4.1.6   Power Supply Connector (P1)

pin Signal
1 Ground
2 Ground
3 3.3V
4 3.3V
5 Ground
6 Ground
7 Ground
8 Ground
9 1.8V
10 1.8V
11 1.8V
12 1.8V
13 1.55V
14 1.55V
15 1.55V
16 Ground
17 Ground
18 Ground
19 Thermo detect
20 12V
21 5V
22 5V

index

2.4.2  Semi-Conductors


ID Description
U1 Customized NEC Flipper Chip - MBU - Middle
U2 Customized IBM PowerPC Gekko Chip - MBU - Bottom
U3 MoSys (MS3M23B-5 A) 12MB 1-T SRAM - MBU - Top Right
U4 MoSys (MS3M23B-5 A) 12MB 1-T SRAM - MBU - Top Right
U5 NEC (D4891281G5 0125XU621) 16MB ARAM - MBU - Top Left
U6 AV Encoder (AVE N -DOL RS5C5828) - MBB - Middle Left
U7 Amplifier? (AMP - DOL 128 124) - MBB - Top Left
U8 MX Clock Generator (Part Number?) - MBU - Bottom Left
U9 MX Clock Generator (Part Number?) - MBU - Bottom Right
U10 MX RTC/IPL (8013108-M RTCN-DOL 1R6022A1)

2.4.2.1   IPL (U10)

Pin Signal
1 Clock
2  
3  
4  
5 CS
6 serial in
7 Ground
8  
9 serial out
10  
11  
12 osc - xtal2
13 osc - xtal1
14  

index

2.5  Details on the motherboard buses


The GameCube has three main external buses on its mother-board: the North-Bridge, the South-Bridge, and the East-Bridge. The fastest bus is the South-Bridge which connects the two 12MB 1T-SRAM chips to the Flipper. The South-Bridge bus has a bus-width of 64 bits, and data is exchanged through it at rates of about 324MHz. The North-Bridge bus connects the IBM PowerPC Gekko processor to the Flipper and is another 64 bit bus-width bus, however, it is only half as fast as the South-Bridge bus and is clocked at around 162MHz. Finally, the East-Bridge bus connects the 16MB Audio RAM chip to the Flipper chip. This bus only has a bus-width of 8 bits and is by far the slowest one, clocked at only 81MHz.
index

2.6  Details on the Macronix (MX) Chips


An outer inspection of the two MX chips does not reveal anything of much interest. Both chips are TSOP packages containing 14 pins each. One of the chips has "CLK" inscripted on it, and the other "RTC". It can be easily inferred that the "CLK" chip functions as some sort of a clock controllergenerator and the "RTC" chip contains the GameCube's Real-Time Clock unit. Two of the RTC chip's pins are connected to an external crystal which regulates the RTC's timing rate. Another pin is connected to a battery located on the controller board. At least two pins are used for both VCC and GND. That leaves nine unknown pins. The RTC MX chip also contains the GameCube's BIOS. While 14 pins is not nearly enough for parallel Flash ROM, EEPROM, mask ROM, etc., it is quite adequate for a serial connection.
index

2.7  DVD Protection


The DVD Protection is based on a custom data format on an otherwhise pretty standard dvd.
index

2.7.1  Filesystem


The custom Filesystem (which is described somewhere else in this Document) by itself is not related to the actual protection mechanism. However, since it is not standard, that alone would already make it hard to read (and create) in a regular (pc-) environment.
index

2.7.2  Barcode


The Barcode is used to authenticate the Disc in the Drive.
index

2.7.3  Encryption


The entire content of a Gamecube DVD is XORed with a constant cyphertext and it is transparently decrypted by the Disc-Controller when reading from the DVD.
2.7.3.1   Cyphertext algorithm

todo
index

2.8  IPL/BIOS Encryption


if you XOR an NTSC with a PAL bios (or any other two different ones), you will notice that because they have different sizes, there are some obviously zero encoded areas in one files, giving you plaintext in the other one which proves: so we do the math:

given Ci = ciphertext, Cl = cleartext, K = key

encoding data goes like:

Ci1 = Cl1 ^ K
Ci2 = Cl2 ^ K

If Cl1 or Cl2 is nothing but zero, the resultant Ci is just K

decoding it would be:

Cl = Ci ^ K

for the areas where Cl is nothing but zeroes in one bios, we know K
index

2.8.1  Flipper decryption logic bug


The hardware decryption logic has a really nasty bug which allows us to read almost the full Cleartext (and thus a large part of the cyphertext, by XORing it with encrypted data).

This, combined with the features of the XOR encryption makea the whole encryption useless (at least very insecure) and implementing a new bios is a straight-forward task (provided that "high speed" (30Mhz) programmable logic with enough memory attached to it is available.). The Bios chip, which also includes sram and rtc (but that won't matter here), is attached to the EXI0 bus. The Exi bus (nothing new here, just to refresh it is an SPI-like bus. SPI is nothing complicated, just four interesting lines: CS (used mainly for syncing, since you need a defined start point, and you can easily attach multiple devices (memory card, ...) to the same bus with seperate CS lines), SI (aka MOSI, master out, slave in - the CPU is always master, the IPL-chip is slave. so SI is gamecube -> device), SO (device -> gamecube, tristated when a device is not active), and CLK (generated by the master). a transfer is basically:

- lower CS (it's low active)
for every bit do:
- set SI bit
- clock
- read SO bit
then:
- put CS high again.

(the exact timing (WHEN to sample SO, clock polarity) is different for different SPI modes, and the one descriped here is not necessarily the one used in the GC. anyway, it doesn't matter here)

so, based on that, we can transfer n-bit messages in BOTH DIRECTIONS. technically this is implemented with a 32bit shift register, with every clock cycle one bit is shifted out (to SI), and one bit is shifted in (to SO). so after n clock cycles, you have n new bits in the shift register and shifted n bits out. the used protocol on the Bus is in most cases very simple but device dependant. In the case of the IPL chip, it's the following:

GC -> IPL
1 bit read/write (0 for read, 1 for write, the latter only valid for RTC/Sram of course)
1 unknown bit
1 bits selection (0 for ROM, 1 for RTC/Sram)
23 bits address
6 bits dummy
after that, the data transfer starts. the 6 dummy cycles are mainly to give the IPL time to read out the first byte.

So you send 32 bits of data (the "address"), and start receiving the ROM bytes. but hey - we said the SPI bus always transfers 2 bits per clock cycle (in marketing terms), since it's fullduplex (in technical terms). we transfer one bit TO the device, and one BACK. we HAVE to. there's no way to NOT send a bit - but it doesn't matter, since for example the bits send from the IPL to the GC in the first 32bits are just ignored - they would contain most probably only zeros, ones, or the bus might be tristate. it's simply not defined, so there's no data to be expected. the same goes for the transfer of the data. the IPL chip sets the correct data at the SO line, but the gamecube - well, sends dummy bits, too. normally you would send zeros, ones, or whatever. it's ignored by the IPL chip anyway (unless it's a write, that would turn the whole thing upside down) now since technically the SPI port is implemented by a shifting register of 32bit length. after transferring 32bits, we would have to read out the new value, store it into memory, and "start the next transfer". but what's about CLEARING the register before? yes, they didn't. in the next transfer, the last 32bit are shifted out as dummy bits. well, one might say, it's just the data just shifted in, so it's completely uninteresting. BUT: the decryption of the loader is done in hardware. it's a part between the SO line of the IPL and the DI port of the shift register. (the encryption is build into the flipper, so no way to intercept the content AFTER decryption).so because the (decrypted) data just shifted in (and stored into memory) is shifted out again - we can get the decrypted data. if you sniff the SI line to the IPL chip, you will get a log like this:

00 00 40 00 (address written to the IPL, in this case: 0x100) 
FF FF FF FF (well just dummy data) 
xx xx xx xx (the data from the last 4 bytes, decrypted) 
... 
... 
xx xx xx xx (the data from the n-1 transfer, decrypted)

so in the end you get every 32bit words except one. For every transfered block you miss 32bits of plaintext data, but you'll get the rest. This should be enough to decrypt huge parts of the bios, and thus recover a large part of K.
index

2.8.2  Cyphertext algorithm


todo
index

2.8.3  replacing the IPL


using the above gained knowledge it is possible to create a small bootrom replacement (using the, yet incomplete, cyphertext), and get more (most) of the IPL Cleartext.

The Gekko boots from 0x100, that's what you read in almost any ppc instruction manual - the reset vector. well, this isn't the complete truth - it boots from it's exception base + 0x100. And the exception base is normally zero, BUT, as the ppc manual states: there's a bit in a HID (i think) register, which turns the exception base to 0xFFF00000. and this bit is "set usually at boot time". So the processor starts to fetch instructions at 0xFFF000100. If you read a bit further, you'll notice that the CPU always reads 64bits at once for code. The memory at 0xFFF00000 is mapped inside the flipper to an automated exi transfer (with that shift register), with the decryption logic active. so the processor starts executing the decrypted instructions, reading 8 bytes at a time, of which we get 4 bytes in plain - not much, (although enough to make some funny experiments, but that's another topic). Luckily, the IPL itself (the cube menu) isn't executed this way since that wouldn't be possible thanks to the "dumb" decryption logic The first ~0x800 bytes start to read data out of the IPL chip and store it to memory (still using the hardware decryption logic), and jump there. they read 1024 bytes at once. Well - now we know 1020 bytes of each transfer, enough to have a complete block of code we can exchange (we have the ciphertext Cl^K = Ci on SO, and the plaintext (delayed by 32bits on SI), and can XOR them to get Cl^K^Ci = K. now we can encrypt our code with K). so now we can make a small code which just dumps the whole IPL - well, to the EXI bus or whereever you can receive it. Now we have all Cl, and thus we can compute all K, thus we can get the complete Plaintext of all available IPLs aswell as encode a larger custom IPL ourselves.

a small note on why you can not recover the plaintext of the original loader this way:

The decryption logic is, whatever it is, a PRNG. It generates a stream of ciphertext ("K"), which has random properties (non-repeating, at least not in the range of some MB), but is always the same. it is incremented with every EXI-transfer. the address is NOT used in the calculation. thus reading from 0xFFF00100 more than one time will give you each time another result. the first time you get Ci(0)^K(0) (the correct result), the second time you get Ci(0)^K(1) etc., i.e. wrong results. Since we never get the K(n) for odd n, i see no chance of recovering it this way, even if we can read at 0xFFF00000+x (and we can do this if we don't set a specific bit to disable the logic).
index