Hi,
I'm trying to access GL textures from pyopencl. Here is my test program:
import sys, os, pygame
from OpenGL.GL import *
sys.path.append("extern/pyopencl/build/lib.linux-x86_64-2.6")
import pyopencl
pygame.init()
screen = pygame.display.set_mode((1024, 768), pygame.HWSURFACE |
pygame.OPENGL | pygame.DOUBLEBUF)
if pyopencl.have_gl():
context = pyopencl.create_some_context()
tex = glGenTextures(1)
glBindTexture(GL_TEXTURE_2D, tex)
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 256, 256, 0, GL_RGBA,
GL_UNSIGNED_BYTE, None)
cltex = pyopencl.GLTexture(context, pyopencl.mem_flags.READ_ONLY,
GL_TEXTURE_2D, 0, tex, 2)
It fails with error:
Traceback (most recent call last):
File "cl.py", line 14, in <module>
cltex = pyopencl.GLTexture(context, pyopencl.mem_flags.READ_ONLY,
GL_TEXTURE_2D, 0, tex, 2)
pyopencl.LogicError: clCreateFromGLTexture2D failed: invalid context
I thought that the problem might be in pyopencl's context creation,
which doesn't take the GL context into account. I tried to fix it by
adding appropriate CL_GL_CONTEXT_KHR, CL_GLX_DISPLAY_KHR and
CL_CONTEXT_PLATFORM props to the context, but then I got another error
"pyopencl.LogicError: clCreateFromGLTexture2D failed: invalid value". I
can run kernels just fine with my setup, but this GL stuff won't work.
What am I doing wrong?
Hi!
When I call all_kernels( ) on program, that was previously successfully
built, it always returns empty list. I looked at the source and found that
the function create_kernels_in_program is not worked corretly. The
kernels vector
should be filled with num_kernels dummy values prior to second call of
clCreateKernelsInProgram, so that clCreateKernelsInProgram could store all
kernel objects in kernels vector.
Best regards, Andrew Karpushin.
Skype: reven-dfg
Hi.
I think there is a typo at wrap_cl.hpp, line 2321 (
http://git.tiker.net/pyopencl.git/blob/HEAD:/src/wrapper/wrap_cl.hpp#l2321)
Instead of
if (work_dim != len(py_local_work_size))
there should be:
if (work_dim != len(py_global_work_offset))
The problem occurred, when I tried to enqueue kernel, specifing
'global_offset' parameter and no 'local_size' parameter, and got an
exception:
TypeError: object of type 'NoneType' has no len()
Best regards, Andrew Karpushin.
Skype: reven-dfg
I experience some very odd behavior from either pyopencl or opencl. I have
produced a boiled-down example and posted it below. Basically the test kernel
below takes some large arrays as arguments and initializes one of these with
1's. It also performs a for-loop in which it initializes some local data
structures with 1's and does nothing else. In theory this for-loop could be
omitted as it does nothing relevant to the output, but... The strange thing is,
that when the for-loop has many iterations (e.g. 1000), things goes horribly
wrong, the screen flickers, and the output array is not initialized with 1's. If
the loop only has a few iterations (e.g. 10), everything works fine.
Also, if the variable "rows" in the outer python code is lowered to e.g. 144
instead of 3344, it also works fine, even with 1000 for-iterations.
Can anyone explain what is going on here?!
(The code takes around 14 seconds to complete on my laptop)
I use macOS 10.6.3, pyopencl-0.91.4 and have just installed
gpucomputingsdk_2.3a_macos_32 from
http://developer.nvidia.com/object/opencl-download.html. However, I'm not sure
how to tell if pyopencl really uses this specific SDK.
My machine is a macbook pro, and get_devices(cl.device_type.GPU) returns
[<pyopencl.Device 'GeForce 9400M' at 0x2022600>, <pyopencl.Device 'GeForce
9600M GT' at 0x1022600>]
Here is the example code: =============================================
import sys
import struct
import pyopencl as cl
import numpy
block_size = 16
matrixLength = 3101104
rows = 3344
row2width = numpy.zeros(rows, numpy.int32)
row2startIdx = numpy.zeros(rows, numpy.int32)
matrix = numpy.zeros(matrixLength, numpy.int32)
pl = cl.get_platforms()
devs = pl[0].get_devices(cl.device_type.GPU)
if(block_size > devs[0].get_info(cl.device_info.MAX_WORK_GROUP_SIZE)):
print "Error: block_size is larger than MAX_WORK_GROUP_SIZE..."
exit(1)
ctx = cl.Context(devs)
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
src = """
// Thread block size
#define BLOCK_SIZE 16
__kernel void matrixMul(__global int* C, int CSize, __global int* A, __global
int* rowWidths, __global int* rowStartIdxs)
{
int bi = get_group_id(0);
int bj = get_group_id(1);
int ti = get_local_id(0);
int tj = get_local_id(1);
int rowAIdx = bi * BLOCK_SIZE + ti;
int rowBIdx = bj * BLOCK_SIZE + tj;
int cOut = 1;
for(int x=0; x<1000; x++) {
__local int As[BLOCK_SIZE][BLOCK_SIZE];
__local int Bs[BLOCK_SIZE][BLOCK_SIZE];
As[ti][tj] = 1;
Bs[ti][tj] = 1;
barrier(CLK_LOCAL_MEM_FENCE);
}
C[rowBIdx * CSize + rowAIdx] = cOut;
}
""";
prg = cl.Program(ctx, src).build();
matrix_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR,
hostbuf=numpy.array(matrix).astype(numpy.int32))
row2width_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR,
hostbuf=numpy.array(row2width).astype(numpy.int32))
row2startIdx_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR,
hostbuf=numpy.array(row2startIdx).astype(numpy.int32))
o = numpy.zeros(rows * rows).astype(numpy.int32)
o_buf = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=o)
w_o_buf = struct.pack("i", rows)
prg.matrixMul(queue, [rows, rows], o_buf, w_o_buf, matrix_buf, row2width_buf,
row2startIdx_buf, local_size=(block_size, block_size))
cl.enqueue_read_buffer(queue, o_buf, o).wait()
i = numpy.nonzero(o)
print len(i[0])
On Donnerstag 06 Mai 2010, Andreas Klöckner wrote:
> Hi all,
>
> just a quick heads-up that I will be moving the PyCUDA list to a
> different server today. There might be a short period where the list is
> unavailable, but I'll try to keep this minimal. All should be back to
> normal by tonight at the latest. If you notice breakage after that,
> please let me know. There should not be any user-visible changes.
The move is done, and everything should be back to working order. DNS
changes might still not have propagated everywhere, but should do so
soon. Let me know if you notice any issues.
Thanks for your patience,
Andreas
In CUDA, you can create a struct with pointers to pass into kernels if you
too many arguments and you hit a limit. Is there any way to do this in
(Py)OpenCL? It appears OpenCL really wants to abstract away a "buffer" from
an underlying "pointer" on the device, presumably so the device is free to
move buffers around as long as it maintains the buffer abstraction.
Given that, how does one get over the kernel argument limitation? Do you
have to try to manage a big chunk of memory manually and offset into that?
That's very fragile if you allocate and deallocate memory frequently, of
varying sizes and so on.
Should I just run a kernel to grab the address of buffers and assume it
won't change?
Also is there any way to see how much register spillage into global memory a
particular kernel will have if run with a particular global/local size, at
least if using an nvidia device, the way you can with CUDA? I'm seeing
strange disparities in performance between CUDA and OpenCL on the same
kernel and I'm trying to get to the bottom of it.
Thanks,
Cyrus
Hi Alan,
On Montag 12 April 2010, Alan wrote:
> I've contact you I guess in Aug 2009. At that time I couldn't do much with
> pycuda or pyopencl because I wanted them all in 64 bits.
>
> So for pycuda I sent some issues in the mailing list.
>
> For pyopencl, I got it all working (including test), but with some
> modifications.
>
> I use OSX 10.6.3 with Xcode 3.2.2, CUDA 3.0 and pycuda and pyopencl via
> git.
>
> ld: library not found for -lOpenCL
>
> I've just removed "-lOpenCL" and reentered the command by hand:
> [snip]
>
> and then make install, test, example all fine.
I've just committed an improved build configuration for OS X that leaves
out -lOpenCL by default. Can you please test if that makes things
better?
Thanks for your feedback,
Andreas