Consider the following code:
import numpy
import pyopencl as cl
import pyopencl.array as array
def to_device(ctx, queue, arr):
size = arr.size * arr.dtype.itemsize
buf = cl.Buffer(ctx, cl.mem_flags.READ_WRITE, size=size)
arr_dev = array.Array(queue, arr.shape, arr.dtype, data=buf)
arr_dev.set(arr, queue=queue, async_=True)
return arr_dev
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
a = numpy.empty(1024 * 4000, numpy.uint64)
ad = to_device(ctx, queue, a)
b = numpy.empty(1024 * 4000, numpy.int32)
bd = to_device(ctx, queue, b)
c = numpy.empty(1024 * 32, numpy.int32)
cd = to_device(ctx, queue, c)
# queue.finish() # uncommenting this line fixes the problem
ad.get()
When I run it on Linux, Tesla P100 and using CUDA as the OpenCL platform,
most of the time (not always) the execution hangs on the last line. Does
anyone have any ideas about what might be happening here?
(If I just use array.to_device(), the problem disappears. This is an
extract from a larger code, where a separate creation of the buffer is
necessary)
Hello everyone,
I'm currently thinking of adopting the Python-opencl package on the Arch
Linux community repositories. I noticed that there are no tarball
signatures or signed tags for the sources.
Is this intentional? I'd be more than convenient security-wise to have a
(or a set) key that I could trust when building the package :)
Thanks!
-Santiago
Code followed by errors below. Even the simplest copy/paste of sample
code from https://documen.tician.de/pyopencl/algorithm.html gives me
these errors. Is it something to do with running the chips in my Mac?
-- devin
========
import pyopencl as cl
import pyopencl.array
from pyopencl.scan import GenericScanKernel
import numpy as np
platform = cl.get_platforms()
my_devices = platform[0].get_devices(device_type=cl.device_type.GPU)
ctx = cl.Context(devices=my_devices)
queue = cl.CommandQueue(ctx)
a = cl.array.arange(queue, 10000, dtype=np.int32)
knl = GenericScanKernel(
ctx, np.int32,
arguments="__global int *ary",
input_expr="ary[i]",
scan_expr="a+b", neutral="0",
output_statement="ary[i+1] = item;"
)
result = knl(a, queue=queue)
print(result)
========
Build on <pyopencl.Device 'Intel(R) HD Graphics 530' on 'Apple' at
0x1024500> succeeded, but said:
<program source>:151:36: warning: unused variable 'psc_seq_i'
psc_index_type psc_seq_i = psc_unit_base +
psc_K*psc_LID_0 + psc_k;
^
<program source>:806:36: warning: unused variable 'psc_seq_i'
psc_index_type psc_seq_i = psc_unit_base +
psc_K*psc_LID_0 + psc_k;
^
<program source>:803:47: warning: comparison of integers of different
signs: 'unsigned long' and 'const psc_index_type' (aka 'const int')
if (psc_K * psc_LID_0 + psc_k < psc_offset_end)
~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
<program source>:851:41: warning: comparison of integers of different
signs: 'unsigned long' and 'const psc_index_type' (aka 'const int')
if (psc_K*psc_LID_0 < psc_offset_end)
~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
<program source>:879:41: warning: comparison of integers of different
signs: 'unsigned long' and 'const psc_index_type' (aka 'const int')
if (psc_K*psc_LID_0 < psc_offset_end)
~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
Also posted at:
https://stackoverflow.com/questions/53576439/pyopencl-comparison-of-integer…
I'm on a MacBook Pro with Intel graphics. I have extremely limited
experience with C code.
There's not a lot of code here, but the actual problem is probably in
the C/OpenCl code generated by the python module.
The compiler generates many repetitions of:
<program source>:819:47: warning: comparison of integers of different
signs: 'unsigned long' and 'const psc_index_type' (aka 'const int')
if (psc_K * psc_LID_0 + psc_k < psc_offset_end)
~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
The code used is as follows:
import pyopencl as cl
import pyopencl.array
import pyopencl.algorithm
import numpy as np
platform = cl.get_platforms()
my_devices = platform[0].get_devices(device_type=cl.device_type.GPU)
ctx = cl.Context(devices=my_devices)
queue = cl.CommandQueue(ctx)
aryary = np.array([[10, 11, 12, 13, 14, 15, 16, 17], [1, 2, 3, 4, 0, 0,
0, 0], [108, 0, 0, 0, 0, 0, 0, 0]], np.int32)
cl_aryary = cl.array.to_device(queue, aryary)
lenary = np.array([8, 4, 1], np.int32)
cl_lenary = cl.array.to_device(queue, lenary)
result = cl.algorithm.copy_if(
cl_aryary,
"sum_array(&ary[i], len[i]) == 108",
extra_args=[('len', cl_lenary)],
preamble='''
int sum_array(__global int *a, int num_elements);
int sum_array(__global int *a, int num_elements)
{
int i, sum=0;
for (i=0; i<num_elements; i++)
{
sum = sum + a[i];
}
return(sum);
}
''',
queue=queue
)
print(result)
There's not a lot of code here, but the actual problem is probably in
the C/OpenCl code generated by the python module.
The compiler generates many repetitions of:
<program source>:819:47: warning: comparison of integers of different
signs: 'unsigned long' and 'const psc_index_type' (aka 'const int')
if (psc_K * psc_LID_0 + psc_k < psc_offset_end)
~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
The code used is as follows:
import pyopencl as cl
import pyopencl.array
import pyopencl.algorithm
import numpy as np
platform = cl.get_platforms()
my_devices = platform[0].get_devices(device_type=cl.device_type.GPU)
ctx = cl.Context(devices=my_devices)
queue = cl.CommandQueue(ctx)
aryary = np.array([[10, 11, 12, 13, 14, 15, 16, 17], [1, 2, 3, 4, 0, 0,
0, 0], [108, 0, 0, 0, 0, 0, 0, 0]], np.int32)
cl_aryary = cl.array.to_device(queue, aryary)
lenary = np.array([8, 4, 1], np.int32)
cl_lenary = cl.array.to_device(queue, lenary)
result = cl.algorithm.copy_if(
cl_aryary,
"sum_array(&ary[i], len[i]) == 108",
extra_args=[('len', cl_lenary)],
preamble='''
int sum_array(__global int *a, int num_elements);
int sum_array(__global int *a, int num_elements)
{
int i, sum=0;
for (i=0; i<num_elements; i++)
{
sum = sum + a[i];
}
return(sum);
}
''',
queue=queue
)
print(result)
I've tried tweaking lots of things here but can't track down what is
blocking this code from compiling, running and actually producing a
result.