Re: [PyCUDA] pycuda patch for 'flat' eggs
by Andreas Klöckner

Hi Maarten,
On Dienstag 03 November 2009, you wrote:
> i've been using your pycuda package to play with, and I really like
> it! much more productive than compiling etc..
> I have pycuda installed with --single-version-externally-managed and a
> different prefix. This causes pycuda not to find the header files.
> I've attached the diff and new compiler.py file to fix this.
Merged in release-0.93 and master.
Thanks for the patch,
Andreas
PS: Please direct stuff like this to the mailing list next time.
9 years, 10 months

PyCuda Memory Question
by Aaron Greenblatt

Hi,
I'm new to Python but have coded stuff in C / CUDA before.
I am trying to copy some variables from Python / Numpy to a GPU, and then back
to the host again. When I get the stuff back from the GPU, I appear to get a few
random NaN's and Inf values - I'm confused as to why these are happening. I have
a few C source modules in the Python script, and, when I remove them, some of
the Inf's go away. This confuses me even more, as I never even called the
functions in the C source modules, so removing them shouldn't make a difference.
(Or am I missing something there too?)
It almost seems like the system / video driver is overwriting the memory that I
write on the video card. Is this a possibility and, if so, how does one deal
with it in PyCuda? (I haven't run into this issue when working on C / CUDA
before, but my dataset was also pretty small). I'm going to look through
nVidia's CUDA programming guide again to make sure that I'm not missing
something obvoius.
Also, I know that I need to optimize the code in the C modules - for now I just
want to get something working, and then I'll write C code that uses the hardware
better.
I've attached source code and output with and without the C source modules.
Does anyone have thoughts as to what's going on here? Thanks for your help!
Aaron
**** Script without C source ***
# Sample source code from the Tutorial Introduction in the documentation.
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
x = numpy.ones([4,5]) * .01
ydes = x
y = numpy.empty_like(x)
L1neurons = 4
L2neurons = 4
L3neurons = 4
L1weightsPerNeuron = x.size
L2weightsPerNeuron = L1neurons
L3weightsPerNeuron = L2neurons
weightsL1 = numpy.ones([L1neurons,L1weightsPerNeuron])
weightsL2 = numpy.ones([L2neurons,L2weightsPerNeuron])
weightsL3 = numpy.ones([L3neurons,L3weightsPerNeuron])
L1s = numpy.empty([L1neurons])
L2s = numpy.empty([L2neurons])
L3s = numpy.empty([L3neurons])
L1xout = numpy.empty_like(L1s)
L1PreAdd = numpy.empty_like(weightsL1)
L2xout = numpy.empty_like(L2s)
L2PreAdd = numpy.empty_like(weightsL2)
L3xout = numpy.empty_like(L3s)
L3PreAdd = numpy.empty_like(weightsL3)
# convert these variables to float singles for GPU use
x = x.astype(numpy.float32)
ydes = ydes.astype(numpy.float32)
y = y.astype(numpy.float32)
weightsL1 = weightsL1.astype(numpy.float32)
weightsL2 = weightsL2.astype(numpy.float32)
weightsL3 = weightsL3.astype(numpy.float32)
L1s = L1s.astype(numpy.float32)
L2s = L2s.astype(numpy.float32)
L3s = L3s.astype(numpy.float32)
L1PreAdd = L1PreAdd.astype(numpy.float32)
L1xout = L1xout.astype(numpy.float32)
L2PreAdd = L2PreAdd.astype(numpy.float32)
L2xout = L2xout.astype(numpy.float32)
L3PreAdd = L3PreAdd.astype(numpy.float32)
L3xout = L3xout.astype(numpy.float32)
# allocate GPU memory
GPUx = cuda.mem_alloc(x.size * x.dtype.itemsize)
GPUydes = cuda.mem_alloc(ydes.size * ydes.dtype.itemsize)
GPUy = cuda.mem_alloc(y.size * ydes.dtype.itemsize)
GPUweightsL1 = cuda.mem_alloc(weightsL1.size * weightsL1.dtype.itemsize)
GPUweightsL2 = cuda.mem_alloc(weightsL2.size * weightsL2.dtype.itemsize)
GPUweightsL3 = cuda.mem_alloc(weightsL3.size * weightsL3.dtype.itemsize)
GPUL1s = cuda.mem_alloc(L1s.size * L1s.dtype.itemsize)
GPUL2s = cuda.mem_alloc(L2s.size * L2s.dtype.itemsize)
GPUL3s = cuda.mem_alloc(L3s.size * L3s.dtype.itemsize)
GPUL1PreAdd = cuda.mem_alloc(L1PreAdd.size * L1PreAdd.dtype.itemsize)
GPUL1xout = cuda.mem_alloc(L1xout.size * L1xout.dtype.itemsize)
GPUL2PreAdd = cuda.mem_alloc(L2PreAdd.size * L2PreAdd.dtype.itemsize)
GPUL2xout = cuda.mem_alloc(L2xout.size * L2xout.dtype.itemsize)
GPUL3PreAdd = cuda.mem_alloc(L3PreAdd.size * L3PreAdd.dtype.itemsize)
GPUL3xout = cuda.mem_alloc(L3xout.size * L3xout.dtype.itemsize)
# copy variables to GPU
cuda.memcpy_htod(GPUx, x)
cuda.memcpy_htod(GPUydes, ydes)
cuda.memcpy_htod(GPUy, y)
cuda.memcpy_htod(GPUweightsL1, weightsL1)
cuda.memcpy_htod(GPUweightsL2, weightsL2)
cuda.memcpy_htod(GPUweightsL3, weightsL3)
cuda.memcpy_htod(GPUL1s, L1s)
cuda.memcpy_htod(GPUL2s, L2s)
cuda.memcpy_htod(GPUL3s, L3s)
cuda.memcpy_htod(GPUL1PreAdd, L1PreAdd)
cuda.memcpy_htod(GPUL1xout, L1xout)
cuda.memcpy_htod(GPUL2PreAdd, L2PreAdd)
cuda.memcpy_htod(GPUL2xout, L2xout)
cuda.memcpy_htod(GPUL3PreAdd, L3PreAdd)
cuda.memcpy_htod(GPUL3xout, L3xout)
# Print stuff
cuda.memcpy_dtoh(x, GPUx)
cuda.memcpy_dtoh(ydes, GPUydes)
cuda.memcpy_dtoh(y, GPUy)
cuda.memcpy_dtoh(weightsL1, GPUweightsL1)
cuda.memcpy_dtoh(weightsL2, GPUweightsL2)
cuda.memcpy_dtoh(weightsL3, GPUweightsL3)
cuda.memcpy_dtoh(L1s, GPUL1s)
cuda.memcpy_dtoh(L2s, GPUL2s)
cuda.memcpy_dtoh(L3s, GPUL3s)
cuda.memcpy_dtoh(L1PreAdd, GPUL1PreAdd)
cuda.memcpy_dtoh(L1xout, GPUL1xout)
cuda.memcpy_dtoh(L2PreAdd, GPUL2PreAdd)
cuda.memcpy_dtoh(L2xout, GPUL2xout)
cuda.memcpy_dtoh(L3PreAdd, GPUL3PreAdd)
cuda.memcpy_dtoh(L3xout, GPUL3xout)
print "x:"
print x
print "y"
print y
print "ydes"
print ydes
print "weightsL1"
print weightsL1
print "L1preadd"
print L1PreAdd
print "L1s"
print L1s
print "L1xout"
print L1xout
print "weightsL2"
print weightsL2
print "L2preadd"
print L2PreAdd
print "L2s"
print L2s
print "L2xout"
print L2xout
print "weightsL3"
print weightsL3
print "L3preadd"
print L3PreAdd
print "L3s"
print L3s
print "L3xout"
print L3xout
****** Output without C source *****
x:
[[ 0.01 0.01 0.01 0.01 0.01]
[ 0.01 0.01 0.01 0.01 0.01]
[ 0.01 0.01 0.01 0.01 0.01]
[ 0.01 0.01 0.01 0.01 0.01]]
y
[[ 0. 0. NaN 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
ydes
[[ 0.01 0.01 0.01 0.01 0.01]
[ 0.01 0.01 0.01 0.01 0.01]
[ 0.01 0.01 0.01 0.01 0.01]
[ 0.01 0.01 0.01 0.01 0.01]]
weightsL1
[[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1.]]
L1preadd
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0.]]
L1s
[ 0. 0. 0. 0.]
L1xout
[ 0. 0. 0. 0.]
weightsL2
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
L2preadd
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
L2s
[ 0. 0. 0. 0.]
L2xout
[ 0. 0. 0. 0.]
weightsL3
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
L3preadd
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
L3s
[ 0. 0. 0. 0.]
L3xout
[ 0. 0. 0. 0.]
******* Script with C Source ***************
# Sample source code from the Tutorial Introduction in the documentation.
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
x = numpy.ones([4,5]) * .01
ydes = x
y = numpy.empty_like(x)
L1neurons = 4
L2neurons = 4
L3neurons = 4
L1weightsPerNeuron = x.size
L2weightsPerNeuron = L1neurons
L3weightsPerNeuron = L2neurons
weightsL1 = numpy.ones([L1neurons,L1weightsPerNeuron])
weightsL2 = numpy.ones([L2neurons,L2weightsPerNeuron])
weightsL3 = numpy.ones([L3neurons,L3weightsPerNeuron])
L1s = numpy.empty([L1neurons])
L2s = numpy.empty([L2neurons])
L3s = numpy.empty([L3neurons])
L1xout = numpy.empty_like(L1s)
L1PreAdd = numpy.empty_like(weightsL1)
L2xout = numpy.empty_like(L2s)
L2PreAdd = numpy.empty_like(weightsL2)
L3xout = numpy.empty_like(L3s)
L3PreAdd = numpy.empty_like(weightsL3)
# convert these variables to float singles for GPU use
x = x.astype(numpy.float32)
ydes = ydes.astype(numpy.float32)
y = y.astype(numpy.float32)
weightsL1 = weightsL1.astype(numpy.float32)
weightsL2 = weightsL2.astype(numpy.float32)
weightsL3 = weightsL3.astype(numpy.float32)
L1s = L1s.astype(numpy.float32)
L2s = L2s.astype(numpy.float32)
L3s = L3s.astype(numpy.float32)
L1PreAdd = L1PreAdd.astype(numpy.float32)
L1xout = L1xout.astype(numpy.float32)
L2PreAdd = L2PreAdd.astype(numpy.float32)
L2xout = L2xout.astype(numpy.float32)
L3PreAdd = L3PreAdd.astype(numpy.float32)
L3xout = L3xout.astype(numpy.float32)
# allocate GPU memory
GPUx = cuda.mem_alloc(x.size * x.dtype.itemsize)
GPUydes = cuda.mem_alloc(ydes.size * ydes.dtype.itemsize)
GPUy = cuda.mem_alloc(y.size * ydes.dtype.itemsize)
GPUweightsL1 = cuda.mem_alloc(weightsL1.size * weightsL1.dtype.itemsize)
GPUweightsL2 = cuda.mem_alloc(weightsL2.size * weightsL2.dtype.itemsize)
GPUweightsL3 = cuda.mem_alloc(weightsL3.size * weightsL3.dtype.itemsize)
GPUL1s = cuda.mem_alloc(L1s.size * L1s.dtype.itemsize)
GPUL2s = cuda.mem_alloc(L2s.size * L2s.dtype.itemsize)
GPUL3s = cuda.mem_alloc(L3s.size * L3s.dtype.itemsize)
GPUL1PreAdd = cuda.mem_alloc(L1PreAdd.size * L1PreAdd.dtype.itemsize)
GPUL1xout = cuda.mem_alloc(L1xout.size * L1xout.dtype.itemsize)
GPUL2PreAdd = cuda.mem_alloc(L2PreAdd.size * L2PreAdd.dtype.itemsize)
GPUL2xout = cuda.mem_alloc(L2xout.size * L2xout.dtype.itemsize)
GPUL3PreAdd = cuda.mem_alloc(L3PreAdd.size * L3PreAdd.dtype.itemsize)
GPUL3xout = cuda.mem_alloc(L3xout.size * L3xout.dtype.itemsize)
# copy variables to GPU
cuda.memcpy_htod(GPUx, x)
cuda.memcpy_htod(GPUydes, ydes)
cuda.memcpy_htod(GPUy, y)
cuda.memcpy_htod(GPUweightsL1, weightsL1)
cuda.memcpy_htod(GPUweightsL2, weightsL2)
cuda.memcpy_htod(GPUweightsL3, weightsL3)
cuda.memcpy_htod(GPUL1s, L1s)
cuda.memcpy_htod(GPUL2s, L2s)
cuda.memcpy_htod(GPUL3s, L3s)
cuda.memcpy_htod(GPUL1PreAdd, L1PreAdd)
cuda.memcpy_htod(GPUL1xout, L1xout)
cuda.memcpy_htod(GPUL2PreAdd, L2PreAdd)
cuda.memcpy_htod(GPUL2xout, L2xout)
cuda.memcpy_htod(GPUL3PreAdd, L3PreAdd)
cuda.memcpy_htod(GPUL3xout, L3xout)
# C source code for stuff we do on GPU
ForwardMult = SourceModule("""
__global__ void layer1forward(float *x, float *weights, float *preAdd)
{
// this does the multiplication in the forward neural net and outputs a
pre-addition matrix
//initialize variables
int elementIdx = threadIdx.x + blockIdx.x*4;
int neuronIdx = blockIdx.y;
int numweights = blockDim.x * gridDim.x;
// do multiply
preAdd[neuronIdx*numweights+elementIdx] = weights[neuronIdx*numweights +
elementIdx] * x[elementIdx];
}
""")
ForwardAdd = SourceModule("""
__global__ void layer1forward(float *preAdd, float *s)
{
// this does adds together the products from forwardmult.
// do add
int numweights = 20;
for(int i = 0; i< numweights; i++) {
s[threadIdx.x] = s[threadIdx.x] + preAdd[numweights * threadIdx.x + i];
}
}
""")
ForwardSigmoid = SourceModule("""
__global__ void sigmoid(float *s, float *xout)
{
// this applies the sigmoid function
xout[threadIdx.x] = (1 - exp(-2*s[threadIdx.x])) / (1 + exp(-2*s[threadIdx.x]));
}
""")
# Print stuff
cuda.memcpy_dtoh(x, GPUx)
cuda.memcpy_dtoh(ydes, GPUydes)
cuda.memcpy_dtoh(y, GPUy)
cuda.memcpy_dtoh(weightsL1, GPUweightsL1)
cuda.memcpy_dtoh(weightsL2, GPUweightsL2)
cuda.memcpy_dtoh(weightsL3, GPUweightsL3)
cuda.memcpy_dtoh(L1s, GPUL1s)
cuda.memcpy_dtoh(L2s, GPUL2s)
cuda.memcpy_dtoh(L3s, GPUL3s)
cuda.memcpy_dtoh(L1PreAdd, GPUL1PreAdd)
cuda.memcpy_dtoh(L1xout, GPUL1xout)
cuda.memcpy_dtoh(L2PreAdd, GPUL2PreAdd)
cuda.memcpy_dtoh(L2xout, GPUL2xout)
cuda.memcpy_dtoh(L3PreAdd, GPUL3PreAdd)
cuda.memcpy_dtoh(L3xout, GPUL3xout)
print "x:"
print x
print "y"
print y
print "ydes"
print ydes
print "weightsL1"
print weightsL1
print "L1preadd"
print L1PreAdd
print "L1s"
print L1s
print "L1xout"
print L1xout
print "weightsL2"
print weightsL2
print "L2preadd"
print L2PreAdd
print "L2s"
print L2s
print "L2xout"
print L2xout
print "weightsL3"
print weightsL3
print "L3preadd"
print L3PreAdd
print "L3s"
print L3s
print "L3xout"
print L3xout
**************** Output with C source **************
# Sample source code from the Tutorial Introduction in the documentation.
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
x = numpy.ones([4,5]) * .01
ydes = x
y = numpy.empty_like(x)
L1neurons = 4
L2neurons = 4
L3neurons = 4
L1weightsPerNeuron = x.size
L2weightsPerNeuron = L1neurons
L3weightsPerNeuron = L2neurons
weightsL1 = numpy.ones([L1neurons,L1weightsPerNeuron])
weightsL2 = numpy.ones([L2neurons,L2weightsPerNeuron])
weightsL3 = numpy.ones([L3neurons,L3weightsPerNeuron])
L1s = numpy.empty([L1neurons])
L2s = numpy.empty([L2neurons])
L3s = numpy.empty([L3neurons])
L1xout = numpy.empty_like(L1s)
L1PreAdd = numpy.empty_like(weightsL1)
L2xout = numpy.empty_like(L2s)
L2PreAdd = numpy.empty_like(weightsL2)
L3xout = numpy.empty_like(L3s)
L3PreAdd = numpy.empty_like(weightsL3)
# convert these variables to float singles for GPU use
x = x.astype(numpy.float32)
ydes = ydes.astype(numpy.float32)
y = y.astype(numpy.float32)
weightsL1 = weightsL1.astype(numpy.float32)
weightsL2 = weightsL2.astype(numpy.float32)
weightsL3 = weightsL3.astype(numpy.float32)
L1s = L1s.astype(numpy.float32)
L2s = L2s.astype(numpy.float32)
L3s = L3s.astype(numpy.float32)
L1PreAdd = L1PreAdd.astype(numpy.float32)
L1xout = L1xout.astype(numpy.float32)
L2PreAdd = L2PreAdd.astype(numpy.float32)
L2xout = L2xout.astype(numpy.float32)
L3PreAdd = L3PreAdd.astype(numpy.float32)
L3xout = L3xout.astype(numpy.float32)
# allocate GPU memory
GPUx = cuda.mem_alloc(x.size * x.dtype.itemsize)
GPUydes = cuda.mem_alloc(ydes.size * ydes.dtype.itemsize)
GPUy = cuda.mem_alloc(y.size * ydes.dtype.itemsize)
GPUweightsL1 = cuda.mem_alloc(weightsL1.size * weightsL1.dtype.itemsize)
GPUweightsL2 = cuda.mem_alloc(weightsL2.size * weightsL2.dtype.itemsize)
GPUweightsL3 = cuda.mem_alloc(weightsL3.size * weightsL3.dtype.itemsize)
GPUL1s = cuda.mem_alloc(L1s.size * L1s.dtype.itemsize)
GPUL2s = cuda.mem_alloc(L2s.size * L2s.dtype.itemsize)
GPUL3s = cuda.mem_alloc(L3s.size * L3s.dtype.itemsize)
GPUL1PreAdd = cuda.mem_alloc(L1PreAdd.size * L1PreAdd.dtype.itemsize)
GPUL1xout = cuda.mem_alloc(L1xout.size * L1xout.dtype.itemsize)
GPUL2PreAdd = cuda.mem_alloc(L2PreAdd.size * L2PreAdd.dtype.itemsize)
GPUL2xout = cuda.mem_alloc(L2xout.size * L2xout.dtype.itemsize)
GPUL3PreAdd = cuda.mem_alloc(L3PreAdd.size * L3PreAdd.dtype.itemsize)
GPUL3xout = cuda.mem_alloc(L3xout.size * L3xout.dtype.itemsize)
# copy variables to GPU
cuda.memcpy_htod(GPUx, x)
cuda.memcpy_htod(GPUydes, ydes)
cuda.memcpy_htod(GPUy, y)
cuda.memcpy_htod(GPUweightsL1, weightsL1)
cuda.memcpy_htod(GPUweightsL2, weightsL2)
cuda.memcpy_htod(GPUweightsL3, weightsL3)
cuda.memcpy_htod(GPUL1s, L1s)
cuda.memcpy_htod(GPUL2s, L2s)
cuda.memcpy_htod(GPUL3s, L3s)
cuda.memcpy_htod(GPUL1PreAdd, L1PreAdd)
cuda.memcpy_htod(GPUL1xout, L1xout)
cuda.memcpy_htod(GPUL2PreAdd, L2PreAdd)
cuda.memcpy_htod(GPUL2xout, L2xout)
cuda.memcpy_htod(GPUL3PreAdd, L3PreAdd)
cuda.memcpy_htod(GPUL3xout, L3xout)
# C source code for stuff we do on GPU
ForwardMult = SourceModule("""
__global__ void layer1forward(float *x, float *weights, float *preAdd)
{
// this does the multiplication in the forward neural net and outputs a
pre-addition matrix
//initialize variables
int elementIdx = threadIdx.x + blockIdx.x*4;
int neuronIdx = blockIdx.y;
int numweights = blockDim.x * gridDim.x;
// do multiply
preAdd[neuronIdx*numweights+elementIdx] = weights[neuronIdx*numweights +
elementIdx] * x[elementIdx];
}
""")
ForwardAdd = SourceModule("""
__global__ void layer1forward(float *preAdd, float *s)
{
// this does adds together the products from forwardmult.
// do add
int numweights = 20;
for(int i = 0; i< numweights; i++) {
s[threadIdx.x] = s[threadIdx.x] + preAdd[numweights * threadIdx.x + i];
}
}
""")
ForwardSigmoid = SourceModule("""
__global__ void sigmoid(float *s, float *xout)
{
// this applies the sigmoid function
xout[threadIdx.x] = (1 - exp(-2*s[threadIdx.x])) / (1 + exp(-2*s[threadIdx.x]));
}
""")
# Print stuff
cuda.memcpy_dtoh(x, GPUx)
cuda.memcpy_dtoh(ydes, GPUydes)
cuda.memcpy_dtoh(y, GPUy)
cuda.memcpy_dtoh(weightsL1, GPUweightsL1)
cuda.memcpy_dtoh(weightsL2, GPUweightsL2)
cuda.memcpy_dtoh(weightsL3, GPUweightsL3)
cuda.memcpy_dtoh(L1s, GPUL1s)
cuda.memcpy_dtoh(L2s, GPUL2s)
cuda.memcpy_dtoh(L3s, GPUL3s)
cuda.memcpy_dtoh(L1PreAdd, GPUL1PreAdd)
cuda.memcpy_dtoh(L1xout, GPUL1xout)
cuda.memcpy_dtoh(L2PreAdd, GPUL2PreAdd)
cuda.memcpy_dtoh(L2xout, GPUL2xout)
cuda.memcpy_dtoh(L3PreAdd, GPUL3PreAdd)
cuda.memcpy_dtoh(L3xout, GPUL3xout)
print "x:"
print x
print "y"
print y
print "ydes"
print ydes
print "weightsL1"
print weightsL1
print "L1preadd"
print L1PreAdd
print "L1s"
print L1s
print "L1xout"
print L1xout
print "weightsL2"
print weightsL2
print "L2preadd"
print L2PreAdd
print "L2s"
print L2s
print "L2xout"
print L2xout
print "weightsL3"
print weightsL3
print "L3preadd"
print L3PreAdd
print "L3s"
print L3s
print "L3xout"
print L3xout
9 years, 10 months