====== Dicas e notas sobre o PS3 ======
===== The Cell Processor =====
The Cell Processor contains 9 processors on a single chip. One is a conventional PowerPC processor, with standard level one (32+32K) and level two (512K) caches and transparent direct access to system memory. To conserve chip space and power this PowerPC is simpler than other common processors. It does not provide hardware support for branch prediction or out of order execution. This makes it perform worse than one would expect given its clock speed (3.2 Ghz in the Playstation 3). The expectation is that the PowerPC will be used in a supervisory role and the majority of processing will be delegated to the SPEs.
The 8 SPE (Synergistic Processing Elements) are optimized for SIMD (single instruction multiple data) processing. They each have 128 128-bit registers, which can be used to store vector datatypes (such as a vector of 4 floats for {x,y,z,w}) and perform vector operations (such as vector multiply). Like the PowerPC processor they lack hardware branch prediction and out of order execution. Unlike the PowerPC processor they do not have caches or direct access to system memory. Instead they each have 256K local store used for both program and data. System memory is accessed by explicit asynchronous DMA requests, and each SPE may have multiple requests outstanding. With a high-speed bus between processors internal to the chip this system is optimized for stream-style computing.
On the Playstation 3 seven of eight SPEs are active -- the last failed verification (if it passed it went into an IBM blade system instead :-). One of these is running a dedicated Hypervisor, so 6 SPEs are available to a Linux application. The program main is started on the PowerPC, and it can make normal Linux system calls. The PowerPC program can start threads on the SPEs, and communicate with them through hardware mailbox message passing, hardware signals, or shared memory (accessed via DMA on the SPE side).
===== Optimization =====
The default for all IBM compilers is for there to be no optimization. The NERSC/IBM recommended optimization options for both C and C++ compiles are -O3 -qstrict -qarch=auto -qtune=auto.
===== Problem with running the xlc compiler =====
[[http://www.ibm.com/developerworks/forums/thread.jspa?threadID=169366]]
I am just reporting that I've found the solution. Actually as a root I could use the compiler with no problems. So here goes my solution:
1. As root go to /opt/ibmcmp/xlc/8.2/bin/
cd /opt/ibmcmp/xlc/8.2/bin/
2. Change the permissions:
chmod 755 xlc_configure
3. Edit xlc_configure and find a line with
if ($0 =~ /(xlc)_configure/) { $prod = $1; \$ver = '8.1'; }
change it to
if (\$0 =~ /(xlc)_configure/) { \$prod = $1; \$ver = '8.2'; }
4. Logout root and as normal user run:
/opt/ibmcmp/xlc/8.2/bin/xlc_configure -ppugcc32 /usr/ -ppugcc64 /usr/ -spugcc32 /usr/ -target cell
WARNING1: you must have the ppu-gcc and spu-gcc installed in /usr/. Otherwise change the paths to a different ones.
WARNING1: you must have access and write privileges to /tmp/. If not go and change it as root.