So this is complete.
The reason Ubuntu was giving an incorrect answer is that I defined a numpy array using
randint and passed that to an OpenCL buffer but used it as an unsigned int in the code. I
changed the numpy definition to dtype=uint32 and now the code is also working as expected
on Ubuntu. I have not tried to figure out how this was affecting the rest of the code but
changing it fixed the problem I was seeing. I assume the context buffer was corrupt and
affected the simulation in a non-trivial way.
On Windows everything calculates as expected that is I get the results I expect in my code
test. In Ubuntu there appears to be a problem with my sorting function. (Code on windows
and Ubuntu are exactly the same except for the program working directory.) That is
particles in the simulation appear stacked to one side instead of uniformly randomly
distributed across the simulation grid. If I remove the sorting function that problem goes
away but the final answer I expect (100) is double (200). I see this on my Ubuntu 14.04
workstation with NVidia GTX 780 TI as well.
I assume the problem is in the CUDA toolkit and Pyopencl combination. No other systems
have this behavior. I have the code running on windows on Intel Graphics Cards, AMD
graphics cards, and now NVidia Cards. It doesn't work on Linux. The equivalent Cuda
version works as expected on Linux and an OpenCL version using Scipy Weave inline works as
expected (only a little slower).
This e-mail, including any attachments is the property of Trinity Health and is intended
for the sole use of the intended recipient(s). It may contain information that is
privileged and confidential. Any unauthorized review, use, disclosure, or distribution is
prohibited. If you are not the intended recipient, please delete this message, and reply
to the sender regarding the error in a separate email.