Hello all,
I finally bit the bullet and got radix working in PyOpenCL :)
It's also improved over the SDK example because it does keys and values,
mostly thanks to my advisor.
Additionally this sort will handle any size array as long as it is a power
of 2. The shipped example does not allow for arrays smaller than 32768, but
I've hooked up their naive scan to allow all smaller arrays.
https://github.com/enjalot/adventures_in_opencl/tree/master/experiments/rad…
all you really need are radix.py, RadixSort.cl and Scan_b.cl
some simple tests are at the bottom of radix.py
I hammered this out because I need it for a project, it's not all that clean
and I didn't add support for sorting on keys only (altho it wouldn't take
much to add that, and I intend to at a later time when I need the
functionality). Hopefully this helps someone else out there. I'll also be
porting it using my own OpenCL C++ wrappers to include in my fluid
simulation library at some point.
I also began looking at AMD's radix from their SPH tutorial, but they use
local atomics which are not supported on my 9600M
--
Ian Johnson
http://enja.org
On Fri, 30 Dec 2011 15:29:14 -0800, Lewis Anderson <1plus2equal3(a)gmail.com> wrote:
> Okay. So I took a closer look at the build process. I am linking against
> /usr/lib/nvidia-current/libOpenCL.so.1.0.0, which I have confirmed exists.
> The problem is that it doesnt seem to get linked, because I get this
> warning when I do "make":
>
> warning: no library file corresponding to
> '/usr/lib/nvidia-current/libOpenCL.so.1.0.0' found (skipping)
>
> And then the final g++ command (in the make process) doesnt contain
> libOpenCL.so, meaning it doesnt get linked which leads to the undefined
> symbol failure.
>
>
> So, the question now, is why does make fail to find libOpenCL.so? Any
> ideas?
You are not allowed to specify the "lib" and ".so.1.0.0" parts in
configuring pyopencl. I.e. when you link using "-lOpenCL", the linker
will look for "libOpenCL.so" (no version tag numbers), which it expects
to be a symlink to the desired version. Also, the path goes in the
separate "lib dir" entry. Also make sure to use Nvidia headers when
linking against an Nvidia libOpenCL.
HTH,
Andreas
PS: Please make sure to keep the list cc'd for archival. Thanks.
On Fri, 30 Dec 2011 12:57:18 -0800, Lewis Anderson <1plus2equal3(a)gmail.com> wrote:
> Hello,
>
> In the interest of improved performance, I am moving to a desktop (running
> Ubuntu 11.10) with a Geforce 7600 GT. I installed the nvidia cuda toolkit
> 4.0 and PyOpenCL for Python2.7. However, when I import PyOpenCL, I get
> "undefined symbol: clGetProgramInfo". Any ideas on how to solve this?
>
> landerson@lewis-anderson-desktop:~/Code$ python
> Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
> [GCC 4.6.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyopencl
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
> "/usr/local/lib/python2.7/dist-packages/pyopencl-2011.2-py2.7-linux-i686.egg/pyopencl/__init__.py",
> line 4, in <module>
> import pyopencl._cl as _cl
> ImportError:
> /usr/local/lib/python2.7/dist-packages/pyopencl-2011.2-py2.7-linux-i686.egg/pyopencl/_cl.so:
> undefined symbol: clGetProgramInfo
> >>>
Sounds like a CL header/library mismatch. Make sure to 'rm -Rf build'
once you've fixed the root cause, before you rebuild pyopencl.
Andreas
Hello,
In the interest of improved performance, I am moving to a desktop (running
Ubuntu 11.10) with a Geforce 7600 GT. I installed the nvidia cuda toolkit
4.0 and PyOpenCL for Python2.7. However, when I import PyOpenCL, I get
"undefined symbol: clGetProgramInfo". Any ideas on how to solve this?
landerson@lewis-anderson-desktop:~/Code$ python
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyopencl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/usr/local/lib/python2.7/dist-packages/pyopencl-2011.2-py2.7-linux-i686.egg/pyopencl/__init__.py",
line 4, in <module>
import pyopencl._cl as _cl
ImportError:
/usr/local/lib/python2.7/dist-packages/pyopencl-2011.2-py2.7-linux-i686.egg/pyopencl/_cl.so:
undefined symbol: clGetProgramInfo
>>>
Thanks,
Lewis
On Sun, 25 Dec 2011 10:50:41 -0800, Lewis Anderson <1plus2equal3(a)gmail.com> wrote:
> Andreas,
>
> I think that works. I realized after sending this message that
> out-of-order execution is disabled by default, which makes everything
> a little simpler. On an unrelated note, I also discovered that
> allocation is quite slow! This discovery allowed me to achieve another
> 3x improvement.
Alloc speed depends on the implementation obviously, but it's really
quite bad on Nvidia. PyOpenCL's memory pools can help with that if you
haven't already gone a different route.
http://documen.tician.de/pyopencl/tools.html#memory-pools
> A third question: What is the best way to contribute
> code/documentation back to PyOpenCL?
Clone the git repo, send a patch to the mailing list. :)
Andreas
Hello,
I'm working on a neural network-based project which relies heavily on array operations. I have moved from Numpy to PyOpenCL in order to speed up these operations, and gotten great results (3.7x speedup on my laptop). I'm looking forward to even better results when I move to a better graphics card. However, in order to get optimal performance, I want to properly handle asynchronous behavior, but I am not sure how to do that when using builtin Array functions (+,-,/,abs(), fill(), etc).
I have defined several custom kernels, and used them successfully, along with some more primitive operations. These custom kernels all return event objects, which I can then use with the wait_for argument to synchronize execution. However, it seems that the only way to do this with the built in functions is by using queue.finish(), since they do not return event objections. Is there a more sophisticated way to do so?
Here is some hypothetical code:
###### using built-in functions, and queue.finish() for synchronization ######
def my_method(x,y):
c = x*y
queue.finish()
event = cl_custom_kernel(c,x,y)
event.wait()
return c
def main():
results = []
for a in range(10):
x = cl.zeros(queue,(100,100), dtype=np.float32)
y = cl.zeros(queue,(100,100), dtype=np.float32)
queue.finish()
x.fill(1.0)
y.fill(2.0)
queue.finish()
z = my_method(x,y)
results.append(z)
###### using custom kernels and events for synchronization ######
# cl_fill(arr,val) is a kernel which does arr.fill(val)
# cl_multiply(a,b,c) is a kernel which does c=a*b
def my_method(x,y,c,x_event,y_event):
mult_event = cl_multiply(x,y,c,wait_for=[x_event,y_event])
final_event = cl_custom_kernel(c,x,y,wait_for=[mult_event])
return final_event
def main()
xs = []
ys = []
zs = []
for a in range(10):
x = cl.zeros(queue,(100,100), dtype=np.float32)
y = cl.zeros(queue,(100,100), dtype=np.float32)
z = cl.zeros(queue,(100,100), dtype=np.float32)
xs.append(x)
ys.append(y)
zs.append(z)
queue.finish()
events = []
for x,y,z in zip(xs,ys,zs):
x_evt = cl_fill(x,1.0)
y_evt = cl_fill(y,1.0)
evt = my_method(x,y,z,x_evt,y_evt)
events.append(evt)
for evt in events:
evt.wait()
Now, lets assume I want each iteration to run in parallel, so that I can saturate the graphics card. Ideally, we would queue the allocation operations, then queue the fill operations (but use the wait_for argument so that they wait until allocation is complete), then we call the method which queues multiplication, and then queues the custom kernel. But each operation should wait until the operation before it finishes.
Is there any way to do this using the built in functions? Or do I have to build custom kernels for everything so that I have access to the events for each operation?
Thanks,
Lewis
Hello,
I am having trouble with some PyOpenCL code. When I run it, it will execute a few commands properly, then fail with "invalid command queue". I searched for this on Google, and found a recommendation that I add some error handlers (http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2061) to the context.
How do I do get this additional error information in PyOpenCL? Or is there some other way to figure out what is causing the command queue to become invalid?
Stack Trace:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "test.py", line 59, in test_visual_cortex
net.update_weights()
File "cortex.py", line 182, in update_weights
return sum([x.do_learning() for x in self.layers])
File "cortex.py", line 570, in do_learning
total_change = sum(x.do_XCAL() for x in self.connections)
File "cortex.py", line 570, in <genexpr>
total_change = sum(x.do_XCAL() for x in self.connections)
File "cortex.py", line 1015, in do_XCAL
cl_do_xcal(self.source_activations(),self.source_idxs_cl,self.dest.act_cl,self.dest.act_l_cl,self.weights_cl,eta).wait()
File "visual_cortex.py", line 215, in source_activations
c.queue.finish()
pyopencl.LogicError: clFinish failed: invalid command queue
Relevant code:
test_visual_cortex() is the main function called. It builds a neural network, then calls reset_act(), update_activations(), and update_weights(). Each of these three functions has PyOpenCL calls, and they all appear to function properly until the crash happens. This code previously used Numpy for all array operations, and worked well. I am moving to OpenCL in order to improve performance.
def test_visual_cortex():
net = v.build_visual_cortex_3('../data/objrec tests/opencl-test/',scale_gabor=True)
input = net.get_layer('Gabor')
img_path = '../facial/train/018_a2.pgm'
input.load_image(img_path,randomize=False)
[x.reset_act() for x in net.layers]
net.update_activations()
net.update_weights()
def update_activations(self):
''' update_activations(): update activations of all layers in network
inputs
none
outputs
maximum change for any layer
effects
updates the activations for every layer in the network via the
calc_activation() method
'''
changes = [x.calc_activation() for x in self.layers]
return max(changes)
def update_weights(self):
''' update_activations(): update activations of all layers in network
inputs
none
outputs
sum of weight change
effects
updates the weights for every layer in the network via the do_learning()
method
'''
return sum([x.do_learning() for x in self.layers])
def calc_activation(self):
''' calc_activation(): calculate activation of each neuron in this layer
algorithm
calculate strength of each connection
sum for each neuron in this layer
do sigma function
set activations based on kWTA
output
sum of absolute value of change in activation this time
effects
updates self.activations
'''
if len(self.connections) == 0:
return 0
if not self.clamped:
if USE_GPU:
inputs = [x.get_net_i() for x in self.connections]
net_i = cl_a.zeros_like(self.act_cl)
queue.finish()
for input in inputs:
# print input
net_i = net_i + input
queue.finish()
# debug
print "calc_activation in",self.name
print "queue:",queue
device = queue.get_info(cl.command_queue_info.DEVICE)
print "DEVICE:",device
print "MEM_SIZE:",device.get_info(cl.device_info.GLOBAL_MEM_SIZE)
print "AVAILABLE:",device.get_info(cl.device_info.AVAILABLE)
# end debug
net_i = net_i/np.float32(len(self.connections))
queue.finish()
thresh = self.kWTA(net_i)
y = net_i - np.float32(thresh)
queue.finish()
y = cl_a.maximum(y,cl_a.zeros_like(y))
queue.finish()
new_act = self.amplify_activation(y)
old_act = self.act_cl
#self.act_cl = old_act + dt*(new_act - old_act)
temp_act = cl_a.zeros_like(old_act)
queue.finish()
cl_calc_activation(old_act,dt,new_act,temp_act).wait()
self.act_cl = temp_act
def do_XCAL(self):
''' do_XCAL(): perform XCAL learning algorithm on all connections
algorithm
get xy averages
calculate necessary change in weight using xcal function
modify weights based on learning rate
update xy_m averages
output
None **not done anymore: float describing total change in weights**
effects
weights and xy_m changed for each connection
'''
total_change = 0.0
if not self.fixed_weights:
if USE_GPU:
cl_do_xcal(self.source_activations(),self.source_idxs_cl,self.dest.act_cl,self.dest.act_l_cl,self.weights_cl,eta).wait()
def source_activations(self):
''' source_activations(): get the proper channel of gabor_data
returns
numpy array for source activations
'''
if c.USE_GPU:
height = self.source.gabor_data.shape[0]
orien_idx = self.orientation_idx
num_oriens = 4
idxs = c.cl_a.arange(c.queue,orien_idx,orien_idx+height*num_oriens,num_oriens,dtype=np.int32)
print c.queue
c.queue.finish()
Thanks,
Lewis Anderson
On Tue, 20 Dec 2011 18:04:16 +0100, Thijs Withaar <thijs(a)withaar.net> wrote:
> Hi all,
>
> To keep my directory folder clean, I installed python into
> C:\Program Files (x86)\python.
>
> It turns out that the Nvidia openCL runtimes do not really like
> paths with spaces, even if properly enqouted. Since pyOpenCL
> adds it's own path as include, this goes wrong in my case.
>
> A workaround is to modify pyopencl/__init__.py such that
> _find_pyopencl_include_path() reads:
>
> for inc_path in possible_include_paths:
> if exists(inc_path):
> if sys.platform.count('win') > 0:
> import ctypes
> buf = bytes(256,'ascii')
> ctypes.windll.kernel32.GetShortPathNameW( inc_path, buf,
> 256 )
> inc_path = buf.decode('utf-16')
> return inc_path
>
> I know it's a nasty hack, but it's the only way I got it working.
> Does anyone have some suggestions on how to improve this?
Can you please try if this breaks Intel/AMD OpenCL on Windows? If not,
I'd be ok with bringing it into git.
Andreas
Hello,
I'm a student at UCSD, and I'm using PyOpenCL for a neural network project. I love it. It is very easy to use. My only problem so far is that the documentation on many of the array functions is lacking. I think I want to use the pyopencl.array.subset_max() function, but I can't find any documentation describing what it does. The only thing I could find is the declaration:
pyopencl.array.subset_max(subset, a, queue=None)¶
at http://documen.tician.de/pyopencl/array.html#pyopencl.array.subset_max
So, my question: What does this mysterious function do?
What I want to do is find the kth biggest value in a PyOpenCL Array. Do any of you know any good ways to do that?
Thanks for your time,
Lewis Anderson
>
> I know it's a nasty hack, but it's the only way I got it working.
> Does anyone have some suggestions on how to improve this?
>
In all seriousness, send this bug report to the Nvidia OpenCL team.