yet another PlayStationPortable Documentation

4  CPU Overview


index

4.1  Registers


32 32bit General Purpose Integer Registers (R0-R31)
0 zero wired zero
1 at assembler temp
2 v0 return value
3 v1  
4 a0 argument registers
5 a1  
6 a2  
7 a3  
8 t0 caller saved (o32 old style names: default)
9 t1  
10 t2  
11 t3  
12 t4 caller saved
13 t5  
14 t6  
15 t7  
16 s0 callee saved
17 s1  
18 s2  
19 s3  
20 s4  
21 s5  
22 s6  
23 s7  
24 t8 caller saved
25 t9  
26 k0 kernel temporary
27 k1  
28 gp global pointer
29 sp stack pointer
30 fp/s8 frame pointer
31 ra return address

index

4.2  Debug Registers


0 DRCNTL Debug Register Control register
1 DEPC Debug Exception PC register
2 DDATA0 Debug Data Monitor 0 and Monitor Data register
3 DDATA1 Debug Data Monitor 1 register
4 IBC Instruction Breakpoint Control/Status register
5 DBC Data Breakpoint Control/Status register
6 DR6 Reserved
7 DR7 Reserved
8 IBA Instruction Breakpoint Address register
9 IBAM Instruction Breakpoint Address Mask register
10 DR10 Reserved
11 DR11 Reserved
12 DBA Data Breakpoint Address register
13 DBAM Data Breakpoint Address Mask register
14 DBD Data Breakpoint Data register
15 DBDM Data Breakpoint Data Mask register
16 DR16 Undefined
17 DR17 Undefined
18 DR18 Undefined
19 DR19 Undefined
20 DR20 Undefined
21 DR21 Undefined
22 DR22 Undefined
23 DR23 Undefined
24 DR24 Undefined
25 DR25 Undefined
26 DR26 Undefined
27 DR27 Undefined
28 DR28 Undefined
29 DR29 Undefined
30 DR30 Undefined
31 DR31 Undefined

index

4.3  COP0 (System Control)


index

4.3.1  Status Registers (mfc/mtc)



0 -   not available (TLB)  
1 -   not available (TLB)  
2 -   not available (TLB)  
3 -   not available (TLB)  
4 -   not available (TLB context)  
5 -   not available (TLB)  
6 -   not available (TLB)  
7     ?  
8 r BadVaddr virtual address of last error/exception sysmem
9 r/w Count system counter interruptman,sysmem
10 -   not available (TLB)  
11 r/w Compare counter comparison value interruptman,sysmem
12 r/w Status system status threadman, reboot, mewrapper,mebooterumdvideo,mebooter,loadcore,interruptman, loadexec, exceptionman,sysmem
13 r/w Cause exception cause threadman, mewrapper,mebooterumdvideo,mebooter,interruptman,exceptionman,sysmem
14 r/w EPC exception program counter loadcore,interruptman,exceptionman,sysmem
15 r PRId processor revision id interruptman,sysmem
16 r/w Config configuration utils,reboot,mewrapper,mebooterumdvideo,mebooter,loadcore,sysmem
17     ?  
18     ? Watch LO  
19     ? Watch HI  
20 -   not available (TLB XContext)  
21 r SCCode Ssyscall-code< <2 interruptman
22 r CPUId CPU ID (0=Main, 1=ME) threadman, sysreg, reboot,loadcore,interruptman,exceptionman,sysmem
23     ?  
24     ?  
25 r/w EBase virtual address of exception vector threadman, exceptionman,sysmem
26     ? Cache ECC  
27     ? Cache Error  
28 r/w TagLo cache instruction register utils,reboot,mewrapper,mebooterumdvideo,mebooter,sysmem
29 r/w TagHi cache instruction register utils,reboot,mewrapper,mebooterumdvideo,mebooter,sysmem
30 r/w ErrorEPC error exception program counter exceptionman,sysmem
31     ?  

index

4.3.2  Control Registers (cfc/ctc)


num         used by  
0 COP0.EPC   context   EBase Handler, general exception handler, error handler,syscall handler sysmem,interruptman, exceptionman
1 COP0.EPC.err 0xbfc00000 context   error (HW,SW,NMI) exception handler, error handler sysmem,exceptionman
2 COP0.Status   context   EBase Handler, general exception handler,syscall handler sysmem,interruptman, exceptionman
3 COP0.Cause   context   EBase Handler, general exception handler,syscall handler sysmem,interruptman, exceptionman
4 GPR.v0   context saved v0 general exception handler ,syscall handler sysmem,interruptman, exceptionman
5 GPR.v1   context saved v1 general exception handler sysmem,interruptman, exceptionman
6 GPR.v0.err 0xbfc00000 context saved v0 error (HW,SW,NMI) exception handler, EBase Handler sysmem,exceptionman
7 GPR.v1.err 0xbfc00000 context saved v1 error (HW,SW,NMI) exception handler, EBase Handler sysmem,exceptionman
8 EXC_TABLE   vector table Exception vector table addr general exception handler sysmem,exceptionman(init)
9 EXC_31_ERROR 0xbfc00000 vector Error handler addr error (HW,SW,NMI) exception handler sysmem,exceptionman(init)
10 EXC_27_DEBUG 0xbfc01000 vector Debug handler addr debug exception handler sysmem,exceptionman
11 EXC_8_SYSCALL   vector Syscall handler addr EBase Handler, register/release exception handler functions sysmem,exceptionman
12 SC_TABLE   vector table (1st) syscalls table addr syscall handler sysmem,interruptman(init),
13 SC_MAX   int (1st) max syscall code syscall handler sysmem,interruptman(init),
14 GPR.sp.Kernel   context Stackpointer Kernel   sysmem,threadman (init), interruptman,
15 GPR.sp.User   context   syscall handler sysmem,threadman (init), interruptman,
16 CurrentTCB   context   syscall handler sysmem,threadman (init), interruptman,
17 ?       ? sysmem
18 NMI_TABLE 0xbfc00000 vector table NMI vector table addr error handler sysmem,exceptionman(init)
19 COP0.Status.err 0xbfc00000 context   EBase Handler, error (HW,SW,NMI) exception handler sysmem,exceptionman
20 COP0.Cause.err 0xbfc00000 context   error (HW,SW,NMI) exception handler sysmem,exceptionman
21 ?       ? sysmem
22 ?       ? sysmem
23 ? GPR.v0   ? context   ? sysmem
24 ? GPR.v1   ? context   ? sysmem
25 PROFILER_BASE   vector profiler hw base addr general exception handler sysmem,threadman, interruptman, exceptionman
26 GPR.v0.dbg 0xbfc01000 context   debug exception handler sysmem,exceptionman
27 GPR.v1.dbg 0xbfc01000 context   debug exception handler sysmem,exceptionman
28 DBGENV 0xbfc01000 vector debug handler env addr debug exception handler sysmem,exceptionman
29 ?       ? sysmem
30 ?       ? sysmem
31 ?       ? sysmem

index

4.4  COP1 (FPU)


32 32bit General Purpose Floatingpoint Registers (FPR0-FPR31)
index

4.4.1  Status Registers (mfc/mtc)


0     vshmain,video_plugin,update_plugin,sysreg,semawm,savedata_plugin,photo_plugin, paf,pafmini,osk_plugin,opening_plugin,netplay_client_plugin,music_plugin,msvideo_plugin,lcdc,impose_plugin,auth_plugin,common_gui,dialogmain,
1     vshmain,video_plugin,update_plugin,sysreg,sysclib,savedata_utility,savedata_plugin,power,photo_plugin, paf,pafmini,osk_plugin,opening_plugin,netplay_server_utility,netconf_plugin,music_plugin,msvideo_plugin,lcdc,impose_plugin,dialogmain,
2     video_plugin,sysreg,photo_plugin, paf,pafmini,osk_plugin,music_plugin,msvideo_plugin,lcdc,
3     video_plugin,sysreg,photo_plugin, paf,pafmini,music_plugin,
4     vshmain,video_plugin,paf,pafmini,dialogmain,
5     video_plugin,sysreg,photo_plugin, paf,pafmini,
6     paf,pafmini,
7      
8     video_plugin,paf,pafmini,
9     paf,pafmini,
10      
11      
12     vshmain,video_plugin,update_plugin,sysconf_plugin,sysclib,savedata_utility,savedata_plugin,savedata_auto_dialog,photo_plugin, paf,pafmini,opening_plugin,netplay_client_plugin,netconf_plugin,music_plugin,msvideo_plugin,auth_plugin,common_gui,dialogmain,game_plugin,
13     vshmain,update_plugin,sysconf_plugin,savedata_utility,savedata_plugin,photo_plugin, paf,pafmini,osk_plugin,netplay_client_plugin,netconf_plugin,music_plugin,msvideo_plugin,game_plugin,common_gui,
14     vshmain,video_plugin,sysconf_plugin,savedata_utility,savedata_plugin,photo_plugin, paf,pafmini,music_plugin,msvideo_plugin,game_plugin,
15     syscon
16     paf,pafmini,
17      
18      
19      
20     vshmain,video_plugin,sysconf_plugin,savedata_plugin,photo_plugin, paf,pafmini,osk_plugin,music_plugin,msvideo_plugin,impose_plugin,game_plugin,common_gui,dialogmain,
21     video_plugin,photo_plugin, paf,pafmini,osk_plugin,music_plugin,msvideo_plugin,game_plugin,common_gui,
22     sysconf_plugin,photo_plugin, paf,pafmini,music_plugin,msvideo_plugin,game_plugin,common_gui,
23     photo_plugin, paf,pafmini,
24     paf,pafmini,
25     paf,pafmini,
26      
27      
28      
29      
30      
31      

index

4.4.2  Control Registers (cfc/ctc)


0 FIR Floating Point Implementation Register sysmem
1 FCR1    
2 FCR2    
3 FCR3    
4 FCR4    
5 FCR5    
6 FCR6   interrupt handler
7 FCR7    
8 FCR8    
9 FCR9    
10 FCR10    
11 FCR11    
12 FCR12    
13 FCR13    
14 FCR14    
15 FCR15    
16 FCR16    
17 FCR17    
18 FCR18    
19 FCR19    
20 FCR20    
21 FCR21    
22 FCR22    
23 FCR23    
24 FCR24    
25 FCCR Floating Point Condition Codes Register  
26 FEXR Floating Point Exceptions Register  
27 FCR27    
28 FENR Floating Point Enables Register  
29 FCR29    
30 FCR30    
31 FCSR Floating Point Control and Status Register sysmem, interruptman, paf, pafmini

index

4.5  COP2 (VFPU)


The psp's VFPU (Vector Floating Point Unit) is a coprocessor that can perform quite a few useful operations. The main purpose of it is vector and matrix processing, but it also supports trigonemtric functions and other mathematical operations, conversions, and mathematical constants.
index

4.5.1  Registers


The VFPU has 128 single precision floating point (IEEE 754) registers (VFR0-VFR127), but they are arranged and accessed in various ways that make it very flexible. Many of the instructions for the VFPU support operations on: And if that weren't enough, it can work with matrices in normal or transposed orders. The registers are grouped into 8 blocks of 16 registers each. This gives you enough room to work with 8 4x4 matrices, 8 3x3 matrices, 32 2x2 matrices. Or you can store up to 32 quad vectors, 40 triple vectors, 64 paired vectors, or 128 single values. The register names you use on the VFPU depends highly on the instruction being performed, and can quickly become a nightmare when trying to figure out how to access or modify certain registers. Register names are numbered with 3 digits: Matrix, Column and Row. The tables below show how single, pair, triple, quad and matrix registers are mapped within a single 16 register block
single Register  
S000 S010 S020 S030
S001 S011 S021 S031
S002 S012 S022 S032
S003 S013 S023 S033
 
Quad Columns Quad Rows
C000 C010 C020 C030
.... .... .... ....
.... .... .... ....
.... .... .... ....
R000 .... .... ....
R001 .... .... ....
R002 .... .... ....
R003 .... .... ....
4*4 Matrix 4*4 Transpose Matrix
M000 .... .... ....
.... .... .... ....
.... .... .... ....
.... .... .... ....
E000 .... .... ....
.... .... .... ....
.... .... .... ....
.... .... .... ....
Triple Columns (1) Triple Columns (2)
C000 C010 C020 C030
.... .... .... ....
.... .... .... ....
                   
                   
C001 C011 C021 C031
.... .... .... ....
.... .... .... ....
Triple Rows (1) Triple Rows (2)

R000 .... ....     
R001 .... ....     
R002 .... ....     
R003 .... ....     

     R010 .... ....
     R011 .... ....
     R012 .... ....
     R013 .... ....
3*3 Matrix (1) 3*3 Matrix (2)

M000 .... ....     
.... .... ....     
.... .... ....     
                   

                   
M001 .... ....     
.... .... ....     
.... .... ....     
3*3 Matrix (3) 3*3 Matrix (4)

     M10 .... ....
     .... .... ....
     .... .... ....
                   

                   
     M011 .... ....
     .... .... ....
     .... .... ....
3*3 Transpose Matrix (1) 3*3 Transpose Matrix (2)

E000 .... ....     
.... .... ....     
.... .... ....     
                   

                   
E001 .... ....     
.... .... ....     
.... .... ....     
3*3 Transpose Matrix (3) 3*3 Transpose Matrix (4)

     E10 .... ....
     .... .... ....
     .... .... ....
                   

                   
     E011 .... ....
     .... .... ....
     .... .... ....
Pair Columns Pair Rows
C000 C010 C020 C030
.... .... .... ....
C002 C012 C022 C032
.... .... .... ....
R000 .... R020 ....
R001 .... R021 ....
R002 .... R022 ....
R003 .... R023 ....
2*2 Matrix 2*2 Transpose Matrix
M000 .... M020 ....
.... .... .... ....
M002 .... M022 ....
.... .... .... ....
E000 .... E020 ....
.... .... .... ....
E002 .... E022 ....
.... .... .... ....
Repeat all of the above with the other 7 blocks of registers. Just change the first digit of the register names to work on a different set
index

4.5.2  Extra Registers


128 VFPU_PFXS Source prefix stack
129 VFPU_PFXT Target prefix stack
130 VFPU_PFXD Destination prefix stack
131 VFPU_CC Condition information
132 VFPU_INF4 VFPU internal information 4
133 VFPU_RSV5 Not used (reserved)
134 VFPU_RSV6 Not used (reserved)
135 VFPU_REV VFPU revision information
136 VFPU_RCX0 Pseudorandom number generator information 0
137 VFPU_RCX1 Pseudorandom number generator information 1
138 VFPU_RCX2 Pseudorandom number generator information 2
139 VFPU_RCX3 Pseudorandom number generator information 3
140 VFPU_RCX4 Pseudorandom number generator information 4
141 VFPU_RCX5 Pseudorandom number generator information 5
142 VFPU_RCX6 Pseudorandom number generator information 6
143 VFPU_RCX7 Pseudorandom number generator information 7

index

4.6  Instruction Format


Every CPU instruction consists of a single word (32 bits) aligned on a word boundary and the major instruction formats are shown here: where:
op 6-bit operation code
rs 5-bit source register specifier
rt 5-bit target (source/destination) register or branch condition
immediate 16-bit immediate, branch displacement or address displacement
target 26-bit jump target address
rd 5-bit destination register specifier
shamt 5-bit shift amount
func 6-bit function field

index

4.7  MIPS Instructions


Mnemonic Opcode op     rs    rt    offset           Description
lw rt, offset(rs) 0x8c000000 100011 sssss ttttt oooooooooooooooo LoadWord Relative to Address in General Purpose Register
sw rt, offset(rs) 0xac000000 101011 sssss ttttt oooooooooooooooo StoreWord Relative to Address in General Purpose Register
Mnemonic Opcode op     rs    rt    immediate        Description
addiu rt,rs,immediate 0x24000000 001001 sssss ttttt iiiiiiiiiiiiiiii Add Immediate Unsigned Word
index

4.7.1  lw


lw LoadWord Relative to Address in General Purpose Register
  %rt <- word_at_address (offset + %base)
lw %rt, offset(%base)  
%rt GPR Target Register (0...31)
%base GPR, specifies Source Address Base
offset signed Offset added to Source Address Base

index

4.7.2  sw


sw StoreWord Relative to Address in General Purpose Register
  word_at_address (offset + %base) <- %rt
sw %rt, offset(%base)  
%rt GPR Target Register (0...31)
%base GPR, specifies Source Address Base
offset signed Offset added to Source Address Base

index

4.7.3  addiu


addiu Add Immediate Unsigned Word
  %rt <- %rs + sign_extended(immediate)
addiu %rt, %rs, immediate  
%rt GPR Target Register (0...31)
%rs GPR Source Register (0...31)
immediate value added to Source Register

index

4.8  Allegrex Instructions


Mnemonic Opcode op     rs    rt    rd    shamt func Description
halt 0x70000000 011100 00000 00000 00000 00000 000000 halt execution until next interrupt
mfic rt,rd 0x70000024 011100 00000 ttttt ddddd 00000 100100 move from IC (Interrupt) register
mtic rt,rd 0x70000026 011100 00000 ttttt ddddd 00000 100110 move to IC (Interrupt) register
index

4.8.1  halt


halt halt execution until next interrupt
   
halt  
   
   
   

index

4.8.2  mfic / mtic


mfic move from IC (Interrupt) register
   
mfic rt,rd  
   
   
   


mtic move to IC (Interrupt) register
   
mtic rt,rd  
   
   
   

index

4.9  VFPU Instructions


Mnemonic Opcode op     rs    rt    offset         c   Description
lv.q rt, offset(rs) 0xd8000000 110110 sssss ttttt oooooooooooooo 0 t LoadVector.Quadword Relative to Address in GPR
sv.q rt, offset(rs), wb 0xf8000000 111110 sssss ttttt oooooooooooooo w t StoreVector.Quadword Relative to Address in GPR
Mnemonic Opcode op         rt        rs        rd      Description
vadd.s rd,rs,rt 0x60000000 011000 000 ttttttt 0 sssssss 0 ddddddd  
vadd.p rd,rs,rt 0x60000080 011000 000 ttttttt 0 sssssss 1 ddddddd  
vadd.t rd,rs,rt 0x60008000 011000 000 ttttttt 1 sssssss 0 ddddddd  
vadd.q rd,rs,rt 0x60008080 011000 000 ttttttt 1 sssssss 1 ddddddd  
vsub.s rd,rs,rt 0x60800000 011010 000 ttttttt 0 sssssss 0 ddddddd  
vsub.p rd,rs,rt 0x60800080 011010 000 ttttttt 0 sssssss 1 ddddddd  
vsub.t rd,rs,rt 0x60808000 011010 000 ttttttt 1 sssssss 0 ddddddd  
vsub.q rd,rs,rt 0x60808080 011010 000 ttttttt 1 sssssss 1 ddddddd  
vdiv.s rd,rs,rt 0x63800000 011000 111 ttttttt 0 sssssss 0 ddddddd  
vdiv.p rd,rs,rt 0x63800080 011000 111 ttttttt 0 sssssss 1 ddddddd  
vdiv.t rd,rs,rt 0x63808000 011000 111 ttttttt 1 sssssss 0 ddddddd  
vdiv.q rd,rs,rt 0x63808080 011000 111 ttttttt 1 sssssss 1 ddddddd  
vmul.s rd,rs,rt 0x64000000 011001 000 ttttttt 0 sssssss 0 ddddddd  
vmul.p rd,rs,rt 0x64000080 011001 000 ttttttt 0 sssssss 1 ddddddd  
vmul.t rd,rs,rt 0x64008000 011001 000 ttttttt 1 sssssss 0 ddddddd  
vmul.q rd,rs,rt 0x64008080 011001 000 ttttttt 1 sssssss 1 ddddddd  
vdot.p rd,rs,rt 0x64800080 011001 001 ttttttt 0 sssssss 1 ddddddd  
vdot.t rd,rs,rt 0x64808000 011001 001 ttttttt 1 sssssss 0 ddddddd  
vdot.q rd,rs,rt 0x64808080 011001 001 ttttttt 1 sssssss 1 ddddddd  
vhdp.p rd,rs,rt 0x66000080 011001 100 ttttttt 0 sssssss 1 ddddddd  
vhdp.t rd,rs,rt 0x66008000 011001 100 ttttttt 1 sssssss 0 ddddddd  
vhdp.q rd,rs,rt 0x66008080 011001 100 ttttttt 1 sssssss 1 ddddddd  
vmin.s rd,rs,rt 0x6D000000 011011 010 ttttttt 0 sssssss 0 ddddddd  
vmin.p rd,rs,rt 0x6D000080 011011 010 ttttttt 0 sssssss 1 ddddddd  
vmin.t rd,rs,rt 0x6D008000 011011 010 ttttttt 1 sssssss 0 ddddddd  
vmin.q rd,rs,rt 0x6D008080 011011 010 ttttttt 1 sssssss 1 ddddddd  
vmax.s rd,rs,rt 0x6D800000 011011 011 ttttttt 0 sssssss 0 ddddddd  
vmax.p rd,rs,rt 0x6D800080 011011 011 ttttttt 0 sssssss 1 ddddddd  
vmax.t rd,rs,rt 0x6D808000 011011 011 ttttttt 1 sssssss 0 ddddddd  
vmax.q rd,rs,rt 0x6D808080 011011 011 ttttttt 1 sssssss 1 ddddddd  
vabs.s rd,rs 0xd0010000 110100 000 0000001 0 sssssss 0 ddddddd  
vabs.p rd,rs 0xd0010080 110100 000 0000001 0 sssssss 1 ddddddd  
vabs.t rd,rs 0xd0018000 110100 000 0000001 1 sssssss 0 ddddddd  
vabs.q rd,rs 0xd0018080 110100 000 0000001 1 sssssss 1 ddddddd  
vneg.s rd,rs 0xd0020000 110100 000 0000010 0 sssssss 0 ddddddd  
vneg.p rd,rs 0xd0020080 110100 000 0000010 0 sssssss 1 ddddddd  
vneg.t rd,rs 0xd0028000 110100 000 0000010 1 sssssss 0 ddddddd  
vneg.q rd,rs 0xd0028080 110100 000 0000010 1 sssssss 1 ddddddd  
vidt.p rd 0xd0030080 110100 000 0000011 0 0000000 1 ddddddd  
vidt.t rd 0xd0038000 110100 000 0000011 1 0000000 0 ddddddd  
vidt.q rd 0xd0038080 110100 000 0000011 1 0000000 1 ddddddd  
vzero.s rd 0xd0060000 110100 000 0000110 0 0000000 0 ddddddd SetVectorZero.Single
vzero.p rd 0xd0060080 110100 000 0000110 0 0000000 1 ddddddd SetVectorZero.Pair
vzero.t rd 0xd0068000 110100 000 0000110 1 0000000 0 ddddddd SetVectorZero.Triple
vzero.q rd 0xd0068080 110100 000 0000110 1 0000000 1 ddddddd SetVectorZero.Quad
vone.s rd 0xd0070000 110100 000 0000111 0 0000000 0 ddddddd SetVectorOne.Single
vone.p rd 0xd0070080 110100 000 0000111 0 0000000 1 ddddddd SetVectorOne.Pair
vone.t rd 0xd0078000 110100 000 0000111 1 0000000 0 ddddddd SetVectorOne.Triple
vone.q rd 0xd0078080 110100 000 0000111 1 0000000 1 ddddddd SetVectorOne.Quad
vrcp.s rs,rd 0xd0100000 110100 000 0010000 0 sssssss 0 ddddddd  
vrcp.p rs,rd 0xd0100080 110100 000 0010000 0 sssssss 1 ddddddd  
vrcp.t rs,rd 0xd0108000 110100 000 0010000 1 sssssss 0 ddddddd  
vrcp.q rs,rd 0xd0108080 110100 000 0010000 1 sssssss 1 ddddddd  
vrsq.s rs,rd 0xd0110000 110100 000 0010001 0 sssssss 0 ddddddd  
vrsq.p rs,rd 0xd0110080 110100 000 0010001 0 sssssss 1 ddddddd  
vrsq.t rs,rd 0xd0118000 110100 000 0010001 1 sssssss 0 ddddddd  
vrsq.q rs,rd 0xd0118080 110100 000 0010001 1 sssssss 1 ddddddd  
vsin.s rs,rd 0xd0120000 110100 000 0010010 0 sssssss 0 ddddddd  
vsin.p rs,rd 0xd0120080 110100 000 0010010 0 sssssss 1 ddddddd  
vsin.t rs,rd 0xd0128000 110100 000 0010010 1 sssssss 0 ddddddd  
vsin.q rs,rd 0xd0128080 110100 000 0010010 1 sssssss 1 ddddddd  
vcos.s rs,rd 0xd0130000 110100 000 0010011 0 sssssss 0 ddddddd  
vcos.p rs,rd 0xd0130080 110100 000 0010011 0 sssssss 1 ddddddd  
vcos.t rs,rd 0xd0138000 110100 000 0010011 1 sssssss 0 ddddddd  
vcos.q rs,rd 0xd0138080 110100 000 0010011 1 sssssss 1 ddddddd  
vexp2.s rs,rd 0xd0140000 110100 000 0010100 0 sssssss 0 ddddddd  
vexp2.p rs,rd 0xd0140080 110100 000 0010100 0 sssssss 1 ddddddd  
vexp2.t rs,rd 0xd0148000 110100 000 0010100 1 sssssss 0 ddddddd  
vexp2.q rs,rd 0xd0148080 110100 000 0010100 1 sssssss 1 ddddddd  
vlog2.s rs,rd 0xd0150000 110100 000 0010101 0 sssssss 0 ddddddd  
vlog2.p rs,rd 0xd0150080 110100 000 0010101 0 sssssss 1 ddddddd  
vlog2.t rs,rd 0xd0158000 110100 000 0010101 1 sssssss 0 ddddddd  
vlog2.q rs,rd 0xd0158080 110100 000 0010101 1 sssssss 1 ddddddd  
vsqrt.s rs,rd 0xd0160000 110100 000 0010110 0 sssssss 0 ddddddd  
vsqrt.p rs,rd 0xd0160080 110100 000 0010110 0 sssssss 1 ddddddd  
vsqrt.t rs,rd 0xd0168000 110100 000 0010110 1 sssssss 0 ddddddd  
vsqrt.q rs,rd 0xd0168080 110100 000 0010110 1 sssssss 1 ddddddd  
vasin.s rs,rd 0xd0170000 110100 000 0010111 0 sssssss 0 ddddddd  
vasin.p rs,rd 0xd0170080 110100 000 0010111 0 sssssss 1 ddddddd  
vasin.t rs,rd 0xd0178000 110100 000 0010111 1 sssssss 0 ddddddd  
vasin.q rs,rd 0xd0178080 110100 000 0010111 1 sssssss 1 ddddddd  
vnrcp.s rs,rd 0xd0180000 110100 000 0011000 0 sssssss 0 ddddddd  
vnrcp.p rs,rd 0xd0180080 110100 000 0011000 0 sssssss 1 ddddddd  
vnrcp.t rs,rd 0xd0188000 110100 000 0011000 1 sssssss 0 ddddddd  
vnrcp.q rs,rd 0xd0188080 110100 000 0011000 1 sssssss 1 ddddddd  
vnsin.s rs,rd 0xd01a0000 110100 000 0011010 0 sssssss 0 ddddddd  
vnsin.p rs,rd 0xd01a0080 110100 000 0011010 0 sssssss 1 ddddddd  
vnsin.t rs,rd 0xd01a8000 110100 000 0011010 1 sssssss 0 ddddddd  
vnsin.q rs,rd 0xd01a8080 110100 000 0011010 1 sssssss 1 ddddddd  
vrexp2.s rs,rd 0xd01c0000 110100 000 0011100 0 sssssss 0 ddddddd  
vrexp2.p rs,rd 0xd01c0080 110100 000 0011100 0 sssssss 1 ddddddd  
vrexp2.t rs,rd 0xd01c8000 110100 000 0011100 1 sssssss 0 ddddddd  
vrexp2.q rs,rd 0xd01c8080 110100 000 0011100 1 sssssss 1 ddddddd  
vi2uc.q rd,rs 0xd03c8080 110100 000 0111100 1 sssssss 1 ddddddd int to unsigned char
vi2s.p rd,rs 0xd03f0080 110100 000 0111111 0 sssssss 1 ddddddd int to short
vi2s.q rd,rs 0xd03f8080 110100 000 0111111 1 sssssss 1 ddddddd int to short
vsgn.s rd,rs 0xd04a0000 110100 000 1001010 0 sssssss 0 ddddddd  
vsgn.p rd,rs 0xd04a0080 110100 000 1001010 0 sssssss 1 ddddddd  
vsgn.t rd,rs 0xd04a8000 110100 000 1001010 1 sssssss 0 ddddddd  
vsgn.q rd,rs 0xd04a8080 110100 000 1001010 1 sssssss 1 ddddddd  
vcst.s rd, a 0xd0600000 110100 000 11aaaaa 0 0000000 0 ddddddd  
vcst.p rd, a 0xd0600080 110100 000 11aaaaa 0 0000000 1 ddddddd  
vcst.t rd, a 0xd0608000 110100 000 11aaaaa 1 0000000 0 ddddddd  
vcst.q rd, a 0xd0608080 110100 000 11aaaaa 1 0000000 1 ddddddd  
vf2in.s rd,rs,scale 0xd2000000 110100 100 SSSSSSS 0 sssssss 0 ddddddd float to int round to near
vf2in.p rd,rs,scale 0xd2000080 110100 100 SSSSSSS 0 sssssss 1 ddddddd  
vf2in.t rd,rs,scale 0xd2008000 110100 100 SSSSSSS 1 sssssss 0 ddddddd  
vf2in.q rd,rs,scale 0xd2008080 110100 100 SSSSSSS 1 sssssss 1 ddddddd  
vi2f.s rd,rs,scale 0xd2800000 110100 101 SSSSSSS 0 sssssss 0 ddddddd int to float
vi2f.p rd,rs,scale 0xd2800080 110100 101 SSSSSSS 0 sssssss 1 ddddddd  
vi2f.t rd,rs,scale 0xd2808000 110100 101 SSSSSSS 1 sssssss 0 ddddddd  
vi2f.q rd,rs,scale 0xd2808080 110100 101 SSSSSSS 1 sssssss 1 ddddddd  
vmmul.p rd,rs,rt 0xf0000080 111100 000 ttttttt 0 sSsssss 1 ddddddd (*1)
vmmul.t rd,rs,rt 0xf0008000 111100 000 ttttttt 1 sSsssss 0 ddddddd (*1)
vmmul.q rd,rs,rt 0xf0008080 111100 000 ttttttt 1 sSsssss 1 ddddddd (*1)
vhtfm2.p rd,rs,rt 0xf0800000 111100 001 ttttttt 0 sssssss 0 ddddddd  
vtfm2.p rd,rs,rt 0xf0800080 111100 001 ttttttt 0 sssssss 1 ddddddd  
vhtfm3.t rd,rs,rt 0xf1000080 111100 010 ttttttt 0 sssssss 1 ddddddd  
vtfm3.t rd,rs,rt 0xf1008000 111100 010 ttttttt 1 sssssss 0 ddddddd  
vhtfm4.q rd,rs,rt 0xf1808000 111100 011 ttttttt 1 sssssss 0 ddddddd  
vtfm4.q rd,rs,rt 0xf1808080 111100 011 ttttttt 1 sssssss 1 ddddddd  
vmidt.p rd 0xf3830080 111100 111 0000011 0 0000000 1 ddddddd SetMatrixIdentity.Pair
vmidt.t rd 0xf3838000 111100 111 0000011 1 0000000 0 ddddddd SetMatrixIdentity.Triple
vmidt.q rd 0xf3838080 111100 111 0000011 1 0000000 1 ddddddd SetMatrixIdentity.Quad
vmzero.p rd 0xf3860080 111100 111 0000110 0 0000000 1 ddddddd SetMatrixZero.Pair
vmzero.t rd 0xf3868000 111100 111 0000110 1 0000000 0 ddddddd SetMatrixZero.Triple
vmzero.q rd 0xf3868080 111100 111 0000110 1 0000000 1 ddddddd SetMatrixZero.Quad
*1) bit 5 of rs is inverted

VFPU load/store instructions seem to support only 16-byte-aligned accesses (similiar to Altivec and SSE).
index

4.9.1  lv


lv LoadVector Quadword Relative to Address in General Purpose Register
  fpu_vtr <- vector_at_address (offset + %gpr)
lv.q %vfpu_rt, offset(%base)  
%fpu_rt VFPU Vector Target Register (column0-31/row32-63)
%base GPR, specifies Source Address Base
offset signed Offset added to Source Address Base

Final Address needs to be 64-byte aligned.
index

4.9.2  sv


sv StoreVector Quadword Relative to Address in General Purpose Register
  vector_at_address (offset + %gpr) <- fpu_vtr
sv.q %vfpu_rt, offset(%base), cache_policy  
%fpu_rt VFPU Vector Target Register (column0-31/row32-63)
%base specifies Source Address Base
offset signed Offset added to Source Address Base
cache_policy 0 = write-through, 1 = write-back

Final Address needs to be 64-byte aligned.
index

4.9.3  vzero


vzero SetVectorZero (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rt] <- 0.0f
vzero.s %vfpu_rt Set 1 Vector Component to 0.0f
vzero.p %vfpu_rt Set 2 Vector Components to 0.0f
vzero.t %vfpu_rt Set 3 Vector Components to 0.0f
vzero.q %vfpu_rt Set 4 Vector Components to 0.0f
%vfpu_rt VFPU Vector Target Register ([s - p - t - q]reg 0..127)
   
   

index

4.9.4  vone


vone SetVectorOne (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rt] <- 0.0f
vone.s %vfpu_rt Set 1 Vector Component to 1.0f
vone.p %vfpu_rt Set 2 Vector Components to 1.0f
vone.t %vfpu_rt Set 3 Vector Components to 1.0f
vone.q %vfpu_rt Set 4 Vector Components to 1.0f
%vfpu_rt VFPU Vector Target Register ([s - p - t - q]reg 0..127)
   
   

index

4.9.5  vmzero


vmzero SetMatrixZero (Pair/Triple/Quad)
  vfpu_mtx[%vfpu_rt] <- 0.0f
vmzero.p %vfpu_rt Set 2x2 Submatrix to 0.0f
vmzero.t %vfpu_rt Set 3x3 Submatrix to 0.0f
vmzero.q %vfpu_rt Set 4x4 Matrix to 0.0f
%vfpu_rt VFPU Matrix Target Register ([s - p - t - q]reg 0..127)
   
   

index

4.9.6  vmidt


vmidt SetMatrixIdentity (Pair/Triple/Quad)
  vfpu_mtx[%vfpu_rt] <- identity matrix
vmidt.p %vfpu_rt Set 2x2 Submatrix to Identity
vmidt.t %vfpu_rt Set 3x3 Submatrix to Identity
vmidt.q %vfpu_rt Set 4x4 Matrix to Identity
%vfpu_rt VFPU Matrix Target Register ([s - p - t - q]reg 0..127)
   
   

index

4.9.7  vmmul


vmmul  
   
vmmul.p %vfpu_rd, %vfpu_rs, %vfpu_rt multiply 2 2x2 Submatrices
vmmul.t %vfpu_rd, %vfpu_rs, %vfpu_rt multiply 2 3x3 Submatrices
vmmul.q %vfpu_rd, %vfpu_rs, %vfpu_rt multiply 2 4x4 Matrices
   
   
   

index

4.9.8  vrcp


vrcp Reciprocal (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- 1.0 / vfpu_regs[%vfpu_rs]
vrcp.s %vfpu_rd, %vfpu_rs calculate reciprocal (1/z) on single
vrcp.p %vfpu_rd, %vfpu_rs calculate reciprocal (1/z) on pair
vrcp.t %vfpu_rd, %vfpu_rs calculate reciprocal (1/z) on triple
vrcp.q %vfpu_rd, %vfpu_rs calculate reciprocal (1/z) on quad
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
   

index

4.9.9  vexp2


vexp2 Exp2 (Single/Pair/Triple/Quad) (calculate 2 raised to the specified real number)
  vfpu_regs[%vfpu_rd] <- 2^(vfpu_regs[%vfpu_rs])
vexp2.s %vfpu_rd, %vfpu_rs calculate 2 ** y
vexp2.p %vfpu_rd, %vfpu_rs calculate 2 ** y
vexp2.t %vfpu_rd, %vfpu_rs calculate 2 ** y
vexp2.q %vfpu_rd, %vfpu_rs calculate 2 ** y
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
   

index

4.9.10  vlog2


vlog2 Log2 (Single/Pair/Triple/Quad) (calculate logarithm base 2 of the specified real number)
  vfpu_regs[%vfpu_rd] <- log2(vfpu_regs[%vfpu_rs])
vlog2.s %vfpu_rd, %vfpu_rs  
vlog2.p %vfpu_rd, %vfpu_rs  
vlog2.t %vfpu_rd, %vfpu_rs  
vlog2.q %vfpu_rd, %vfpu_rs  
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
   

index

4.9.11  vsqrt


vsqrt SquareRoot (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- sqrt(vfpu_regs[%vfpu_rs])
vsqrt.s %vfpu_rd, %vfpu_rs calculate square root
vsqrt.p %vfpu_rd, %vfpu_rs calculate square root
vsqrt.t %vfpu_rd, %vfpu_rs calculate square root
vsqrt.q %vfpu_rd, %vfpu_rs calculate square root
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
   

index

4.9.12  vrsq


vrsq ReciprocalSquareRoot (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- 1.0 / sqrt(vfpu_regs[%vfpu_rs])
vrsq.s %vfpu_rd, %vfpu_rs calculate reciprocal sqrt (1/sqrt(x)) on single
vrsq.p %vfpu_rd, %vfpu_rs calculate reciprocal sqrt (1/sqrt(x)) on pair
vrsq.t %vfpu_rd, %vfpu_rs calculate reciprocal sqrt (1/sqrt(x)) on triple
vrsq.q %vfpu_rd, %vfpu_rs calculate reciprocal sqrt (1/sqrt(x)) on quad
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
   

index

4.9.13  vsin


vsin Sinus (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- sin(vfpu_regs[%vfpu_rs])
vsin.s %vfpu_rd, %vfpu_rs calculate sin on single
vsin.p %vfpu_rd, %vfpu_rs calculate sin on pair
vsin.t %vfpu_rd, %vfpu_rs calculate sin on triple
vsin.q %vfpu_rd, %vfpu_rs calculate sin on quad
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
   

note: trig functions on the vfpu expect input values like vsin(degrees/90) or vsin(2/PI * radians)
index

4.9.14  vcos


vcos Cosine (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- cos(vfpu_regs[%vfpu_rs])
vcos.s %vfpu_rd, %vfpu_rs calculate cos on single
vcos.p %vfpu_rd, %vfpu_rs calculate cos on pair
vcos.t %vfpu_rd, %vfpu_rs calculate cos on triple
vcos.q %vfpu_rd, %vfpu_rs calculate cos on quad
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
   

Note by John Kelley: trig functions on the vfpu expect input values like vsin(degrees/90) or vsin(2/PI * radians)
index

4.9.15  vasin


vasin ArcSin (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- arcsin(vfpu_regs[%vfpu_rs])
vasin.s %vfpu_rd, %vfpu_rs calculate arcsin
vasin.p %vfpu_rd, %vfpu_rs calculate arcsin
vasin.t %vfpu_rd, %vfpu_rs calculate arcsin
vasin.q %vfpu_rd, %vfpu_rs calculate arcsin
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
   

index

4.9.16  vnrcp


vnrcp NegativeReciprocal (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- -1/vfpu_regs[%vfpu_rs]
vnrcp.s %vfpu_rd, %vfpu_rs calculate negative reciprocal
vnrcp.p %vfpu_rd, %vfpu_rs calculate negative reciprocal
vnrcp.t %vfpu_rd, %vfpu_rs calculate negative reciprocal
vnrcp.q %vfpu_rd, %vfpu_rs calculate negative reciprocal
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
   

index

4.9.17  vnsin


vnsin NegativeSin (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- -sin(vfpu_regs[%vfpu_rs])
vnsin.s %vfpu_rd, %vfpu_rs calculate negative sin
vnsin.p %vfpu_rd, %vfpu_rs calculate negative sin
vnsin.t %vfpu_rd, %vfpu_rs calculate negative sin
vnsin.q %vfpu_rd, %vfpu_rs calculate negative sin
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)

index

4.9.18  vrexp2


vrexp2 ReciprocalExp2 (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- 1/exp2(vfpu_regs[%vfpu_rs])
vrexp2.s %vfpu_rd, %vfpu_rs calculate 1/(2^y)
vrexp2.p %vfpu_rd, %vfpu_rs calculate 1/(2^y)
vrexp2.t %vfpu_rd, %vfpu_rs calculate 1/(2^y)
vrexp2.q %vfpu_rd, %vfpu_rs calculate 1/(2^y)
%vfpu_rd VFPU Vector Target Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)

index

4.9.19  vi2uc


vi2uc int to unsigned char
   
vi2uc.q %vfpu_rd, %vfpu_rs  
   
   
   

index

4.9.20  vi2s


vi2s int to short
   
vi2s.p %vfpu_rd, %vfpu_rs  
vi2s.q %vfpu_rd, %vfpu_rs  
   
   
   

index

4.9.21  vcst


vcst StoreConstant (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- constants[%a]
vcst.s %vfpu_rd, %a store constant into single
vcst.p %vfpu_rd, %a store constant into pair
vcst.t %vfpu_rd, %a store constant into triple
vcst.q %vfpu_rd, %a store constant into quad
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)
   
%a VFPU Constant

ID Constant Value
0 n/a 0
1 HUGE 340282346638528859811704183484516925440.0
2 SQRT(2) 1.41421
3 1/SQRT(2) 0.70711
4 2/SQRT(PI) 1.12838
5 2/PI 0.63662
6 1/PI 0.31831
7 PI/4 0.78540
8 PI/2 1.57080
9 PI 3.14159
10 E 2,71828
11 LOG2E 1.44270
12 LOG10E 0.43429
13 LN2 0.69315
14 LN10 2.30259
15 2*PI 6.28319
16 PI/6 0.52360
17 LOG10TWO 0.30103
18 LOG2TEN 3.32193
19 SQRT(3)/2 0.86603
20-31 n/a 0

index

4.9.22  vf2in


vf2in float to int round to near
   
vf2in.s %vfpu_rd, %vfpu_rs, scale  
vf2in.p %vfpu_rd, %vfpu_rs, scale  
vf2in.t %vfpu_rd, %vfpu_rs, scale  
vf2in.q %vfpu_rd, %vfpu_rs, scale  
   
   
   

index

4.9.23  vi2f


vi2f int to float
   
vi2f.s %vfpu_rd, %vfpu_rs, scale  
vi2f.p %vfpu_rd, %vfpu_rs, scale  
vi2f.t %vfpu_rd, %vfpu_rs, scale  
vi2f.q %vfpu_rd, %vfpu_rs, scale  
   
   
   

index

4.9.24  vadd


vadd VectorAdd (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- vfpu_regs[%vfpu_rs] + vfpu_regs[%vfpu_rt]
vadd.s %vfpu_rd, %vfpu_rs, %vfpu_rt Add Single
vadd.p %vfpu_rd, %vfpu_rs, %vfpu_rt Add Pair
vadd.t %vfpu_rd, %vfpu_rs, %vfpu_rt Add Triple
vadd.q %vfpu_rd, %vfpu_rs, %vfpu_rt Add Quad
%vfpu_rt VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)

index

4.9.25  vsub


vsub VectorSub (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- vfpu_regs[%vfpu_rs] - vfpu_regs[%vfpu_rt]
vsub.s %vfpu_rd, %vfpu_rs, %vfpu_rt Sub Single
vsub.p %vfpu_rd, %vfpu_rs, %vfpu_rt Sub Pair
vsub.t %vfpu_rd, %vfpu_rs, %vfpu_rt Sub Triple
vsub.q %vfpu_rd, %vfpu_rs, %vfpu_rt Sub Quad
%vfpu_rt VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)

index

4.9.26  vdiv


vdiv VectorDiv (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- vfpu_regs[%vfpu_rs] / vfpu_regs[%vfpu_rt]
vdiv.s %vfpu_rd, %vfpu_rs, %vfpu_rt div Single
vdiv.p %vfpu_rd, %vfpu_rs, %vfpu_rt div Pair
vdiv.t %vfpu_rd, %vfpu_rs, %vfpu_rt div Triple
vdiv.q %vfpu_rd, %vfpu_rs, %vfpu_rt div Quad
%vfpu_rt VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)

index

4.9.27  vmul


vmul VectorMul (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- vfpu_regs[%vfpu_rs] * vfpu_regs[%vfpu_rt]
vmul.s %vfpu_rd, %vfpu_rs, %vfpu_rt mul Single
vmul.p %vfpu_rd, %vfpu_rs, %vfpu_rt mul Pair
vmul.t %vfpu_rd, %vfpu_rs, %vfpu_rt mul Triple
vmul.q %vfpu_rd, %vfpu_rs, %vfpu_rt mul Quad
%vfpu_rt VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)

index

4.9.28  vdot


vdot VectorDotProduct (Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- dotproduct(vfpu_regs[%vfpu_rs], vfpu_regs[%vfpu_rt])
vdot.p %vfpu_rd, %vfpu_rs, %vfpu_rt Dot Product Pair
vdot.t %vfpu_rd, %vfpu_rs, %vfpu_rt Dot Product Triple
vdot.q %vfpu_rd, %vfpu_rs, %vfpu_rt Dot Product Quad
%vfpu_rt VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)

index

4.9.29  vhdp


vhdp VectorHomogenousDotProduct (Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- homogenousdotproduct(vfpu_regs[%vfpu_rs], vfpu_regs[%vfpu_rt])
vhdp.p %vfpu_rd, %vfpu_rs, %vfpu_rt Dot Product Pair
vhdp.t %vfpu_rd, %vfpu_rs, %vfpu_rt Dot Product Triple
vhdp.q %vfpu_rd, %vfpu_rs, %vfpu_rt Dot Product Quad
%vfpu_rt VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register ([s - p - t - q]reg 0..127)
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)

index

4.9.30  vidt


vidt VectorLoadIdentity (Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- identity vector
vidt.p %vfpu_rd Set 2x1 Vector to Identity
vidt.t %vfpu_rd Set 3x1 Vector to Identity
vidt.q %vfpu_rd Set 4x1 Vector to Identity
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)

index

4.9.31  vabs


vabs AbsoluteValue (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- abs(vfpu_regs[%vfpu_rs])
vabs.s %vfpu_rd, %vfpu_rs Absolute Value Single
vabs.p %vfpu_rd, %vfpu_rs Absolute Value Pair
vabs.t %vfpu_rd, %vfpu_rs Absolute Value Triple
vabs.q %vfpu_rd, %vfpu_rs Absolute Value Quad
%vfpu_rd VFPU Vector Destination Register (m[p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register (m[p - t - q]reg 0..127)

index

4.9.32  vneg


vneg Negate (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- -vfpu_regs[%vfpu_rs]
vneg.s %vfpu_rd, %vfpu_rs Negate Single
vneg.p %vfpu_rd, %vfpu_rs Negate Pair
vneg.t %vfpu_rd, %vfpu_rs Negate Triple
vneg.q %vfpu_rd, %vfpu_rs Negate Quad
%vfpu_rd VFPU Vector Destination Register (m[p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register (m[p - t - q]reg 0..127)

index

4.9.33  vsgn


vsgn Sign.(Single/Pair/Triple/Quad )
  vfpu_regs[%vfpu_rd] <- sign(vfpu_regs[%vfpu_rs])
vsgn.s %vfpu_rd, %vfpu_rs Get Sign Single
vsgn.p %vfpu_rd, %vfpu_rs Get Sign Pair
vsgn.t %vfpu_rd, %vfpu_rs Get Sign Triple
vsgn.q %vfpu_rd, %vfpu_rs Get Sign Quad
%vfpu_rd VFPU Vector Destination Register (m[p - t - q]reg 0..127)
%vfpu_rs VFPU Vector Source Register (m[p - t - q]reg 0..127)

Sets rd values to 1 or -1, depending on sign of input values
index

4.9.34  vmin


vmin VectorMin (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- min(vfpu_regs[%vfpu_rs], vfpu_reg[%vfpu_rt])
vmin.s %vfpu_rd, %vfpu_rs, %vfpu_rt Get Minimum Value Single
vmin.p %vfpu_rd, %vfpu_rs, %vfpu_rt Get Minimum Value Pair
vmin.t %vfpu_rd, %vfpu_rs, %vfpu_rt Get Minimum Value Triple
vmin.q %vfpu_rd, %vfpu_rs, %vfpu_rt Get Minimum Value Quad
%vfpu_rt VFPU Vector Source Register (sreg 0..127)
%vfpu_rs VFPU Vector Source Register ([p - t - q]reg 0..127)
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)

index

4.9.35  vmax


vmax VectorMax (Single/Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- max(vfpu_regs[%vfpu_rs], vfpu_reg[%vfpu_rt])
vmax.s %vfpu_rd, %vfpu_rs, %vfpu_rt Get Maximum Value Single
vmax.p %vfpu_rd, %vfpu_rs, %vfpu_rt Get Maximum Value Pair
vmax.t %vfpu_rd, %vfpu_rs, %vfpu_rt Get Maximum Value Triple
vmax.q %vfpu_rd, %vfpu_rs, %vfpu_rt Get Maximum Value Quad
%vfpu_rt VFPU Vector Source Register (sreg 0..127)
%vfpu_rs VFPU Vector Source Register ([p - t - q]reg 0..127)
%vfpu_rd VFPU Vector Destination Register ([s - p - t - q]reg 0..127)

index

4.9.36  vtfm


vtfm VectorTransform (Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- transform(vfpu_matrix[%vfpu_rs], vfpu_vector[%vfpu_rt])
vtfm2.p %vfpu_rd, %vfpu_rs, %vfpu_rt Transform pair vector by pair matrix
vtfm3.t %vfpu_rd, %vfpu_rs, %vfpu_rt Transform triple vector by triple matrix
vtfm4.q %vfpu_rd, %vfpu_rs, %vfpu_rt Transform quad vector by quad matrix
%vfpu_rt VFPU Vector Source Register (qreg 0..127)
%vfpu_rs VFPU Matrix Source Register (qmatrix 0..127)
%vfpu_rd VFPU Vector Destination Register (qreg 0..127)

index

4.9.37  vhtfm


vhtfm VectorHomogeneousTransform (Pair/Triple/Quad)
  vfpu_regs[%vfpu_rd] <- homeogenoustransform(vfpu_matrix[%vfpu_rs], vfpu_vector[%vfpu_rt])
vhtfm2.p %vfpu_rd, %vfpu_rs, %vfpu_rt Homogeneous transform quad vector by pair matrix
vhtfm3.t %vfpu_rd, %vfpu_rs, %vfpu_rt Homogeneous transform quad vector by triple matrix
vhtfm4.q %vfpu_rd, %vfpu_rs, %vfpu_rt Homogeneous transform quad vector by quad matrix
%vfpu_rt VFPU Vector Source Register (qreg 0..127)
%vfpu_rs VFPU Matrix Source Register (qmatrix 0..127)
%vfpu_rd VFPU Vector Destination Register (qreg 0..127)

index

4.10  Caches


There are two caches: the data cache and the instruction cache. The data cache is used when your program does a load or store to memory, and the instruction cache is used to actually execute all the instructions your program. In general you can ignore the instruction cache unless you're using dynamic code generation, though the discussion of cache locality also applies to the instruction cache. The PSP's cache structure is pretty simple compared to other CPUs. There's only a 32k L1 cache; there's no L2 cache to worry about.
index

4.10.1  Cache structure and operation


The 32k of cache is divided up into 64-byte chunks, called cache lines. The cache is managed in terms of cache lines, so even if you only use 1 byte of a line, all 64 bytes are allocated. When the CPU goes to read a piece of memory, it first looks to see if there's a copy of the memory in cache. If there is, this is called a cache hit, and it can fetch the data in a few cycles. If not, this is a cache miss, and it will take a long time (possibly dozens of cycles) to fetch from main memory. However, on a cache miss, it will find a new cache line for the data, and read from main memory into the cache line; the next time you touch this 64-byte area of memory, it will probably get a cache hit. Writes are similar. When your program writes to memory, it will just write into the cache, allocating a cache line if necessary. Subsequent writes and reads to that cache line will be cache hits. A cache line can be in one of three states: invalid, clean or dirty. Invalid means that the cache line has no useful data, and no memory operation will hit it. Clean means that the cache line contains an up-to-date copy of a piece of main memory. Dirty means that the cache line has been written to, and main memory is out of date. So, what does "allocate a cache line" mean? Because the cache is small relative to main memory, whenever you need a new cache line, you probably need to throw something else out. If the cache line you're replacing is invalid, then you can just start using it. If the line is clean, you can also just drop the old line and start using it. If it is dirty, however, you need to write the old contents back to memory before reusing the line; if you don't then previously written data will effectively disappear. Note that this means that there's an indefinite, non-deterministic amount of time before a write actually hits main memory. The only thing which normally pushes a dirty cache line into memory is being replaced. If it is never replaced, then it will never be written.
index

4.10.2  Cache Coherency


All this happens transparently from a software perspective. Apart from the performance effects of all this going on, there's really no way to know its happening, and you can safely ignore it. Or can you? The tricky part about all this is that the CPU ends up with its own copy of pieces of main memory. If the CPU were the only user of memory in the system, then this would be fine, but the PSP has several other functional units which all use memory, and communicate with the main CPU via memory. In order for this to work, you need to make sure that every user of memory has a consistent and coherent view of memory. In the Intel world, the CPU performs something called "cache snooping". This means that a dedicated piece of hardware looks at all memory operations to main memory, and checks to see if the CPU's cache has a more up-to-date version of the memory. It also looks at memory writes, and makes sure that the CPU's cache has the most up to date version of the data. The PSP's MIPS isn't like that. It has no snooping or hardware coherency support, which leads to a problem: if you simply write out a set of commands for the GE into memory, and then tell the GE to run them, there's no guarentee that your commands have actually been written to memory by the time GE tries to run them; they could just be still sitting there in dirty cache lines. You'll see some vertices looking fine, but others are way off in space. You'll see most of your texture, but chunks of it are missing or junk.
index

4.10.3  The Uncached Address Space


The MIPS offers one solution to this problem: the uncached address space. If you bit-wise OR your pointer with 0x40000000 you end up with a corresponding pointer in the uncached address space, which is generally known as an uncached pointer. These two pointers are aliases: they're two different pointers which refer to the same piece of physical memory. When you use the uncached pointer, the memory access completely bypasses all the machinery described above: reads will come straight from memory, and writes will go straight to memory. This leads to a potiential problem. If you use memory through the cached pointer, and then start using the uncached pointer, then you will be in a world of pain. It won't explode, crash or do anything obvious. It may seem to work perfectly well 99% of the time. But then you'll get bitten by strange, non-deterministic, elusive bugs which will move around and disappear every time you try to debug the problem. When you use uncached memory, it completely ignores the cache, and the cache completely ignores the uncached access. If you write to cached memory, then read via uncached, you won't necessarily see the previously written value because its still in cache. If you write via the uncached pointer, your write may get undone at some later arbitrary point when the dirty cache line eventually gets written. The solution? You need to: Note that even if you freshly allocate memory and never touch it with a cached pointer, you still need to write-invalidate the memory range, because it may still be partially cached from when it was previously allocated (this is quite likely, because efficient allocators will try to return still-cached memory for good cache use).
index

4.10.4  Cache Management Functions


The PSP Kernel provides a set of functions for manipulating the cache:
index