| Version 21 (modified by , 12 years ago) ( diff ) |
|---|
OpenCL/CIVL structs
- 'args' : a struct containing all of the variables that will be passed into a kernel. Uses an array of void pointers. Size determined by number of parameters. Has an array of ints as a flag for whether a method used malloc or not, used in clReleaseKernel
typedef struct { void * param[3]; int mallocflag[numArgs]; }args;
- 'cl_kernel': holds the arguments, the kernel method to be executed, and data pertaining to local, global, and workgroup ids.
typedef struct
{
void * arguments;
char * method;
int global_id;
int local_id;
int workgroup;
} cl_kernel;
Notes About Address Space
- 'clSetKernelArg': sets arguments for an array of each device in the kernel.
Currently, this method
err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);
Is replaced by careful use of malloc, but a method may be implemented later.
- 'Private (local)' : Each kernel gets it's own version of this variable. malloc + memcpy is the way to do this.
- 'Global' : Each kernel shares this data, an update to one will affect the other. Pointer with memory location. For regular non pointer variables, make a pointer and assign it to it, then use the pointer.
- 'Group' : Each workgroup shares the same data. Currently unimplemented.
((args*)kernel.arguments)->input = input; //global ((args*)kernel.arguments)->param[2] = (int *)malloc(sizeof(int)); memcpy(((args*)kernel.arguments)->param[2], &count, sizeof(int)); //local
Memory Model
Memory is determined by address qualifiers on kernel methods. They follow the following format:
void __kernel foo( __qualifier type bar, ...){}
The address qualifiers are as follows:
__global : memory allocated from global address space, images are global by default. __constant : a region of read-only memory. __local : memory shared by work-group. __private : private per work-item memory.
Memory fences and barriers can be used to control threads on the thread or workgroup level. Note that data across workgroups are generally not shared except when using global memory.
Any method with cl_bool blocking_read in the parameter can be used to determine if the host can run work in parallel or sequentially across different devices. http://www.khronos.org/files/opencl-1-2-quick-reference-card.pdf
Methods and equivalents
- 'clCreateKernel': takes in the specified args struct and the name of the method, and returns a kernel.
cl_kernel clCreateKernel(args * argument, char * function)
{
cl_kernel kernel;
kernel.arguments = argument;
kernel.method = function;
for(int j = 0; j < numArgs; j++)
{
((args*)kernel.arguments)->mallocflag[j] = 0;
}
return kernel;
}
- 'clReleaseKernel' : a convenient way to free the variables in each kernel's arguments without memory leaks or freeing a space already freed.
void clReleaseKernel(cl_kernel kernel)
{
for (int i = 0; i < numArgs; i++)
{
printf("I am argument %d with value %d\n", i, ((args*)kernel.arguments)->mallocflag[i]);
if (((args*)kernel.arguments)->mallocflag[i] == 1)
{
printf("and I pass the flag check\n");
free(((args*)kernel.arguments)->param[i]);
}
}
}
input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * count, NULL, NULL);
clCreateBuffer creates a buffer object with certain types of information attached to it. In the .cvl it only uses the right side with the third parameter, and mallocs space for it
input = (int *) malloc(sizeof(int) * count);
err = clEnqueueWriteBuffer(commands, input, CL_TRUE, 0, sizeof(float) * count, data, 0, NULL, NULL);
clEnqueueWriteBuffer writes to a buffer with extra data. In the transformation it is currently a memcpy.
memcpy(input, data, sizeof(int) * count);
err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(local), &local, NULL);
In regular openCL this will ask the device what work group size to use at runtime. This is not used in the transformation, instead it will make an input for the local workgroup size.
$input int LOCAL;
err = clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);
Starts taking all the information gathered and queues up work and workgroups using the specified commands, kernel, global worksize, and local work size.
int clEnqueueNDRangeKernel(cl_command_queue commands, cl_kernel kernel, int global, int local)
{
$assert(global % local == 0);
cl_kernel param[global/local];
$proc procs[global/local];
for(int i = 0; i < global/local; i++)
{
param[i] = kernel;
param[i].workgroup = i;
procs[i] = $spawn workfunc(local, global, param[i]);
}
//this part here is the new clFinish(commands);
for(int i = 0; i < global/local; i++)
{
$wait(procs[i]);
}
return CL_SUCCESS;
}
This method simulates the use of block workgroups
void worksquare(size_t local, size_t global, kernel param)
{
for(int i = local * param.workgroup; i < local * param.workgroup + local; i++)
{
param.local_id = i % local;
param.global_id = i;
//printf("My workgroup id is %d, my global id is %d, my local id is %d\n", param.workgroup, param.global_id, param.local_id);
square(param.workgroup, param.global_id, param.local_id, param.input, param.output, param.count);
}
}
err = clEnqueueReadBuffer( commands, output, CL_TRUE, 0, sizeof(float) * count, results, 0, NULL, NULL );
Puts the data from a kernel from one of the variables passed in to another variable.
memcpy(results, output, sizeof(int) * count);

