| | 64 | === The Signature of a Kernel === |
| | 65 | A typical cuda kernel call has the form: |
| | 66 | {{{ |
| | 67 | K<<<gridDim, blockDim, memSize, cudaStream>>>(args); |
| | 68 | }}} |
| | 69 | The first four parameters inside the angle brackets are called the ''execution configuration parameters'' or simply the ''configuration parameters''. The parameters `gridDim` and `blockDim` are then implicitly available from within the kernel under those names. To support these features, we simply transformation the definition of the kernel to be a regular function with the function signature |
| | 70 | {{{ |
| | 71 | void _cuda_K(dim3 gridDim, dim3 blockDim, size_t _cuda_mem_size, cudaStream_t _cuda_stream, args); |
| | 72 | }}} |
| | 73 | The function call is then transformed to match this new signature. |
| | 74 | |
| | 75 | === The Layers of a CUDA Kernel in CIVL-C === |
| | 76 | The transformation described above certainly allows us to support the use of the configuration parameters. However, we still have to somehow add code which will emulate the true execution of a CUDA kernel with the 4 configuration parameters given. That means we must spawn the appropriate number of threads, each with appropriate local CUDA parameters `blockIdx` and `threadIdx` in scope and given a value, and then appropriately enqueue the kernel into the given stream, waiting as necessary on other cuda kernels in the stream. |