| Version 7 (modified by , 12 years ago) ( diff ) |
|---|
Note: err is just an int that will say whether the function executed properly, not part of translation at this time.
This iteration of the transformation is rather basic, with functions broken down into bare essentials. They may be transformed into slightly more complicated functions as time goes on. For now, here is what their equivalents are, using example code:
err = clGetDeviceIDs(NULL, gpu ? CL_DEVICE_TYPE_GPU : CL_DEVICE_TYPE_CPU, 1, &device_id, NULL);
This function in openCL will query for devices then put their ID into the pointer &device_id. In the transformation it puts in arbitrary numbers for ID.
This only takes up the 3rd and 4th parameter, the 3rd for the number of devices, but also uses the name of the pointer in the 4th place.
int device_id[NUM_DEVICES];
//put device_ids
for(int i = 0; i < NUM_DEVICES; i++)
{
device_id[i] = i;
}
kernel = clCreateKernel(program, "square", &err);
Creates the kernel using data from the program, and the name of the function. In the transformation it chooses which function to use. Currently unimplemented.
input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * count, NULL, NULL);
clCreateBuffer creates a buffer object with certain types of information attached to it. In the .cvl it only uses the right side with the third parameter, and mallocs space for it
input = (int *) malloc(sizeof(int) * count);
err = clEnqueueWriteBuffer(commands, input, CL_TRUE, 0, sizeof(float) * count, data, 0, NULL, NULL);
clEnqueueWriteBuffer writes to a buffer with extra data. In the transformation it is currently a memcpy.
memcpy(input, data, sizeof(int) * count);
err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input); err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &output); err |= clSetKernelArg(kernel, 2, sizeof(unsigned int), &count);
clSetKernelArg sets arguments for an array of each device in the kernel. Note that the part for picking a device actually comes from clCreateCommandQueue.
$assert(global%local == 0);
kernel param[global/local];
for(int i = 0; i < global/local; i++)
{
//Also picks the device to be used
param[i].device_id = device_id[0];
//other parts of the struct
param[i].input = input;
param[i].output = output;
param[i].count = count;
}
err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(local), &local, NULL);
In regular openCL this will ask the device what work group size to use at runtime. This is not used in the transformation, instead it will make an input for the local workgroup size.
$input int LOCAL;
err = clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);
Starts taking all the information gathered and queues up work and workgroups using the specified commands, kernel, global worksize, and local work size.
$proc procs[global/local];
for(int i = 0; i < global/local; i++)
{
param[i].workgroup = i;
//procs[i] = $spawn square(param[i].global_id, param[i].input, param[i].output, param[i].count);
procs[i] = $spawn worksquare(local, global, param[i]);
}
for(int i = 0; i < global/local; i++)
{
$wait(procs[i]);
}
This method simulates the use of block workgroups
void worksquare(size_t local, size_t global, kernel param)
{
for(int i = local * param.workgroup; i < local * param.workgroup + local; i++)
{
param.local_id = i % local;
param.global_id = i;
//printf("My workgroup id is %d, my global id is %d, my local id is %d\n", param.workgroup, param.global_id, param.local_id);
square(param.workgroup, param.global_id, param.local_id, param.input, param.output, param.count);
}
}
err = clEnqueueReadBuffer( commands, output, CL_TRUE, 0, sizeof(float) * count, results, 0, NULL, NULL );
Puts the data from a kernel from one of the variables passed in to another variable.
memcpy(results, output, sizeof(int) * count);
