Yet Another Gamecube Documentation

3  Gekko CPU Overview


index

3.1  Registers


spr920 4 r/w HID2
31 24 23 16 15 8 7 0
               
bit(s)   description
2   PSE - Paired-Single load and store instructions enabled
1    
0   LSQE - Paired-Single mode enabled

index

3.2  Calling conventions


parameters are passed in r3 (1st) r4 (2nd) and r5 (third) up to r12 (9th), further parameters are passed through the stack.
index

3.3  PPC Instructions


index

3.3.1  Integer Instructions


Mnemonic Opcode Description
addi    
addis    
add    
addo    
subf    
subfo    
addic    
subfic    
addc    
addco    
subfc    
subfco    
adde    
addeo    
subfe    
subfeo    
addme    
addmeo    
subfme    
subfmeo    
addze    
addzeo    
subfze    
subfzeo    
neg    
nego    
mulli    
mullw    
mullwo    
mulhw    
mulhwu    
divw    
divwo    
divwu    
divwuo    
cmpi    
cmp    
cmpli    
cmpl    


Mnemonic Opcode Description
andi    
andis    
ori    
oris    
xori    
xoris    
and    
or    
xor    
nand    
nor    
eqv    
andc    
orc    


Mnemonic Opcode Description
extsb    
extsh    
cntlzw    
rlwinm    
rlwnm    
rlwimi    
slw    
srw    
srawi    
sraw    

index

3.3.2  Floating-Point Instructions


Mnemonic Opcode Description
fadd    
fadds (*)    
fsub    
fsubs (*)    
fmul    
fmuls (*)    
fdiv    
fdivs    
fres (*)    
frsqrte    
fsel (*)    
fmadd    
fmadds (*)    
fmsub    
fmsubs (*)    
fnmadd    
fnmadds (*)    
fnmsub    
fnmsubs (*)    
frsp (*)    
fctiw    
fctiwz    
fcmpu    
fcmpo    
mffs    
mcrfs    
mtfsfi    
mtfsf    
mtfsb0    
mtfsb1    
fmr (*)    
fneg    
fabs    
fnabs    

(*) - modified for paired singles
index

3.3.3  Integer Load and Store Instructions


Mnemonic Opcode Description
lbz    
lbzx    
lbzu    
lbzux    
lhz    
lhzx    
lhzu    
lhzux    
lha    
lhax    
lhau    
lhaux    
lwz    
lwzx    
lwzu    
lwzux    
stb    
stbx    
stbu    
stbux    
sth    
sthx    
sthu    
sthux    
stw    
stwx    
stwu    
stwux    
lhbrx    
lwbrx    
sthbrx    
stwbrx    
lmw    
stmw    
lswi    
lswx    
stswi    
stswx    

index

3.3.4  Floating-Point Load and Store Instructions


Mnemonic Opcode Description
lfs    
lfsx    
lfsu    
lfsux    
lfd    
lfdx    
lfdu    
lfdux    
stfs    
stfsx    
stfsu    
stfsux    
stfd    
stfdx    
stfdu    
stfdux    
stfiwx    

index

3.3.5  Branch Instructions


Mnemonic Opcode Description
b   unconditional Jump
ba    
bl   branch and link
bla    
bc    
bca    
bcl    
bcla    
bclr    
bclrl    
bcctr    
bcctrl    

index

3.3.6  Condition Register Logical Instructions


Mnemonic Opcode Description
crand    
cror    
crxor    
crnand    
crnor    
creqv    
crandc    
crorc    
mcrf    

index

3.3.7  Misc Instructions


Mnemonic Opcode Description
twi    
tw    
sc    
rfi    
mtcrf    
mcrxr    
mfcr    
mtmsr    
mfmsr    
mtspr    
mfspr    
lwarx    
stwcx.    
sync    
mftb    
eieio    
isync    
dcbt    
dcbtst    
dcbz    
dcbz_l    
dcbst    
dcbf    
dcbi    
icbi    
eciwx    
ecowx    
mtsr    
mtsrin    
mfsr    
mfsrin    
tlbie    
tlbsync    

index

3.4  additional Gekko Instructions


The Gekko has some additional (and some modified respectivly) instructions in its Paired-single mode which are useful for fast vector and matrix calculations and which are analog to Intel (and other x86 series) processors "streamed instructions", known as SSE. This extension is unique for the Gekko processor and used to calculate two single-precision numbers ("floats" in C) in one clock cycle. The floating-Point Registers of the Gekko (FPRs) are modified in the following way : one half is used for the first single number, and other for the second. These parts are named as "PS0" and "PS1". PS instructionset is divided into two parts : Load and Store Quantization and Paired-Single Arithmetic instructions. Load and Store Quantization instructions are used for fast integer-float type casting and some specific memory operations, using PS0 and PS1 parts of FPR. If you try to execute any PS instruction without HID2[PSE] and HID2[LSQE] bit set, an illegal instruction exception will be generated.
index

3.4.1  FPR format in paired-single mode


63 56 55 48 57 40 39 32
1111 1111 1111 1111 1111 1111 1111 1111
31 24 23 16 15 8 7 0
0000 0000 0000 0000 0000 0000 0000 0000
bit(s)   description
32-63 1 PS1
0-31 0 PS0

index

3.4.2  Arithmetic Instructions


Mnemonic Opcode Description
ps_abs 000100 DDDDD 00000 BBBBB 01000 01000 R absolute value
ps_add 000100 DDDDD AAAAA BBBBB 00000 10101 R add
ps_cmpo0 000100 DDD00 AAAAA BBBBB 00001 00000 0 compare ordered high
ps_cmpo1 000100 DDD00 AAAAA BBBBB 00011 00000 0 compare ordered low
ps_cmpu0 000100 DDD00 AAAAA BBBBB 00000 00000 0 compare unordered high
ps_cmpu1 000100 DDD00 AAAAA BBBBB 00010 00000 0 compare unordered low
ps_div 000100 DDDDD AAAAA BBBBB 00000 10010 R divide
ps_merge00 000100 DDDDD AAAAA BBBBB 10000 10000 R merge high
ps_merge01 000100 DDDDD AAAAA BBBBB 10001 10000 R merge direct
ps_merge10 000100 DDDDD AAAAA BBBBB 10010 10000 R merge swapped
ps_merge11 000100 DDDDD AAAAA BBBBB 10011 10000 R merge low
ps_mr 000100 DDDDD 00000 BBBBB 00010 01000 R move register
ps_nabs 000100 DDDDD 00000 BBBBB 00100 01000 R negate absolute value
ps_neg 000100 DDDDD 00000 BBBBB 00001 01000 R negate
ps_res 000100 DDDDD 00000 BBBBB 00000 11000 R reciprocal estimate
ps_rsqrte 000100 DDDDD 00000 BBBBB 00000 11010 R reciprocal square root estimate
ps_sub 000100 DDDDD AAAAA BBBBB 00000 10100 R substract
ps_madd 000100 DDDDD AAAAA BBBBB CCCCC 11101 R multiply and add
ps_madds0 000100 DDDDD AAAAA BBBBB CCCCC 01110 R multiply and add scalar high
ps_madds1 000100 DDDDD AAAAA BBBBB CCCCC 01111 R multiply and add scalar low
ps_msub 000100 DDDDD AAAAA BBBBB CCCCC 11100 R multiply and substract
ps_mul 000100 DDDDD AAAAA 00000 CCCCC 11001 R multiply
ps_muls0 000100 DDDDD AAAAA 00000 CCCCC 01100 R multiply scalar high
ps_muls1 000100 DDDDD AAAAA 00000 CCCCC 01101 R multiply scalar low
ps_nmadd 000100 DDDDD AAAAA BBBBB CCCCC 11111 R negative multiply and add
ps_nmsub 000100 DDDDD AAAAA BBBBB CCCCC 11110 R negative multiply and substract
ps_sel 000100 DDDDD AAAAA BBBBB CCCCC 10111 R select
ps_sum0 000100 DDDDD AAAAA BBBBB CCCCC 01010 R vector sum high
ps_sum1 000100 DDDDD AAAAA BBBBB CCCCC 01011 R vector sum low

Note : R opcode field (comparsion of result with zero) is unused. (=0)
3.4.2.1   PS_ABS

absolute value Clear bit 0 of PS0[B] and copy result to PS0[D]
Clear bit 0 of PS1[B] and copy result to PS1[D]
3.4.2.2   PS_ADD

add PS0[D] = PS0[A] + PS0[B]
PS1[D] = PS1[A] + PS1[B]
3.4.2.3   PS_CMPO0

compare ordered high "c" holds result of comparsion
If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b
Else if (PS0[A] < PS0[B]) then c = 1000b
Else if (PS0[A] > PS0[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
3.4.2.4   PS_CMPO1

compare ordered low "c" holds result of comparsion
If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b
Else if (PS1[A] < PS1[B]) then c = 1000b
Else if (PS1[A] > PS1[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
3.4.2.5   PS_CMPU0

compare unordered high "c" holds result of comparsion
If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b
Else if (PS0[A] < PS0[B]) then c = 1000b
Else if (PS0[A] > PS0[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
3.4.2.6   PS_CMPU1

compare unordered low "c" holds result of comparsion
If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b
Else if (PS1[A] < PS1[B]) then c = 1000b
Else if (PS1[A] > PS1[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
These four compare instructions looks same, because I omitted some
unecessary FPSCR stuff.
3.4.2.7   PS_DIV

divide PS0[D] = PS0[A] / PS0[B]
PS1[D] = PS1[A] / PS1[B]
3.4.2.8   PS_MERGE00

merge high PS0[D] = PS0[A]
PS1[D] = PS0[B]
3.4.2.9   PS_MERGE01

merge direct PS0[D] = PS0[A]
PS1[D] = PS1[B]
3.4.2.10   PS_MERGE10

merge swapped PS0[D] = PS1[A]
PS1[D] = PS0[B]
3.4.2.11   PS_MERGE11

merge low PS0[D] = PS1[A]
PS1[D] = PS1[B]
3.4.2.12   PS_MR

move register PS0[D] = PS0[B]
PS1[D] = PS1[B]
3.4.2.13   PS_NABS

negate absolute value Set bit 0 of PS0[B] and copy result to PS0[D]
Set bit 0 of PS1[B] and copy result to PS1[D]
3.4.2.14   PS_NEG

negate Invert bit 0 of PS0[B] and copy result to PS0[D]
Invert bit 0 of PS1[B] and copy result to PS1[D]
3.4.2.15   PS_RES

reciprocal estimate PS0[D] = 1 / PS0[B]
PS1[D] = 1 / PS1[B]
3.4.2.16   PS_RSQRTE

reciprocal square root estimate PS0[D] = 1 / SQRT(PS0[B])
PS1[D] = 1 / SQRT(PS1[B])
3.4.2.17   PS_SUB

subtract PS0[D] = PS0[A] - PS0[B]
PS1[D] = PS1[A] - PS1[B]
3.4.2.18   PS_MADD

multiply-add PS0[D] = PS0[A] * PS0[C] + PS0[B]
PS1[D] = PS1[A] * PS1[C] + PS1[B]
3.4.2.19   PS_MADDS0

multiply-add scalar high PS0[D] = PS0[A] * PS0[C] + PS0[B]
PS1[D] = PS1[A] * PS0[C] + PS1[B]
3.4.2.20   PS_MADDS1

multiply-add scalar low PS0[D] = PS0[A] * PS1[C] + PS0[B]
PS1[D] = PS1[A] * PS1[C] + PS1[B]
3.4.2.21   PS_MSUB

multiply-subtract PS0[D] = PS0[A] * PS0[C] - PS0[B]
PS1[D] = PS1[A] * PS1[C] - PS1[B]
3.4.2.22   PS_MUL

multiply PS0[D] = PS0[A] + PS0[C]
PS1[D] = PS1[A] + PS1[C]
3.4.2.23   PS_MULS0

multiply scalar high PS0[D] = PS0[A] + PS0[C]
PS1[D] = PS1[A] + PS0[C]
3.4.2.24   PS_MULS1

multiply scalar low PS0[D] = PS0[A] + PS1[C]
PS1[D] = PS1[A] + PS1[C]
3.4.2.25   PS_NMADD

negative multiply-add PS0[D] = - (PS0[A] * PS0[C] + PS0[B])
PS1[D] = - (PS1[A] * PS1[C] + PS1[B])
3.4.2.26   PS_NMSUB

negative multiply-subtract PS0[D] = - (PS0[A] * PS0[C] - PS0[B])
PS1[D] = - (PS1[A] * PS1[C] - PS1[B])
3.4.2.27   PS_SEL

select If (PS0[A] >= 0) then PS0[D] = PS0[C] else PS0[D] = PS0[B]
If (PS1[A] >= 0) then PS1[D] = PS1[C] else PS1[D] = PS1[B]
3.4.2.28   PS_SUM0

vector sum high PS0[D] = PS0[A] + PS1[B]
PS1[D] = PS1[C]
3.4.2.29   PS_SUM1

vector sum low PS0[D] = PS0[C]
PS1[D] = PS0[A] + PS1[B]
index

3.4.3  Load and Store Instructions


Mnemonic Opcode Description
psq_lx 000100 DDDDD AAAAA BBBBB WIII 000110 0 Paired Singles Quantized Load indexed
psq_lux 000100 DDDDD AAAAA BBBBB WIII 100110 0 Paired Singles Quantized Load with Update indexed
psq_stx 000100 SSSSS AAAAA BBBBB WIII 000111 0 Paired Singles Quantized Store indexed
psq_stux 000100 SSSSS AAAAA BBBBB WIII 100111 0 Paired Singles Quantized Store with Update indexed


Mnemonic Opcode Description
psq_l 111000 DDDDD AAAAA WIII dddddddddddd Paired Singles Quantized Load
psq_lu 111001 DDDDD AAAAA WIII dddddddddddd Paired Singles Quantized Load with Update
psq_st 111100 SSSSS AAAAA WIII dddddddddddd Paired Singles Quantized Store
psq_stu 111101 SSSSS AAAAA WIII dddddddddddd Paired Singles Quantized Store with Update

3.4.3.1   psq_lx

Paired Singles Quantized Load indexed
3.4.3.2   psq_lux

Paired Singles Quantized Load with Update indexed
3.4.3.3   psq_stx

Paired Singles Quantized Store indexed
3.4.3.4   psq_stux

Paired Singles Quantized Store with Update indexed
3.4.3.5   psq_l

Paired Singles Quantized Load
3.4.3.6   psq_lu

Paired Singles Quantized Load with Update
3.4.3.7   psq_st

Paired Singles Quantized Store
3.4.3.8   psq_stu

Paired Singles Quantized Store with Update
index

3.4.4  modified floating point instructions


In paired single mode (HID2[PSE] = 1), all the double-precision floating point instructions are still valid, and execute as in non-paired single mode. All single-precision floating-point instructions (fadds, fsubs, fmuls, fdivs, fmadds, fmsubs, fnmadds, fnmsubs, fres, frsp) switch their meaning and operate on the ps0 operand.

Mnemonic Opcode Description
fadds    
fsubs    
fmuls    
fdivs    
fmadds    
fmsubs    
fnmadds    
fnmsubs    
fres    
frsp    
fsel    
fmr    

3.4.4.1   fadds

3.4.4.2   fsubs

3.4.4.3   fmuls

3.4.4.4   fdivs

3.4.4.5   fmadds

3.4.4.6   fmsubs

3.4.4.7   fnmadds

3.4.4.8   fnmsubs

3.4.4.9   fres

3.4.4.10   frsp

3.4.4.11   fsel

3.4.4.12   fmr

index

3.5  Programming Tips and additional information


index

3.5.1  Machine State Register


to do
index

3.5.2  Caches


to do
index

3.5.3  branch unit


to flush branch unit's dynamic prediction logic, you must sequentially execute 3 branches

        .... 
        b label1 
label1: b label2 
label2: b label3 
label3: 
        .... 
 
index