Changes between Version 5 and Version 6 of Implementation_of_CUDA_in_CIVL


Ignore:
Timestamp:
03/17/22 21:01:24 (4 years ago)
Author:
Alex Wilton
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Implementation_of_CUDA_in_CIVL

    v5 v6  
    7474
    7575=== The Layers of a CUDA Kernel in CIVL-C ===
    76 The transformation described above certainly allows us to support the use of the configuration parameters. However, we still have to somehow add code which will emulate the true execution of a CUDA kernel with the 4 configuration parameters given. That means we must spawn the appropriate number of threads, each with appropriate local CUDA parameters `blockIdx` and `threadIdx` in scope and given a value, and then appropriately enqueue the kernel into the given stream, waiting as necessary on other cuda kernels in the stream.
     76The transformation described above certainly allows us to support the use of the configuration parameters. However, we still have to somehow add code which will emulate the true execution of a CUDA kernel with the 4 configuration parameters given. That means we must spawn the appropriate number of threads, each with appropriate local CUDA parameters `blockIdx` and `threadIdx` declared in scope and given a value, and then appropriately enqueue the kernel into the given stream, waiting as necessary on other cuda kernels in the stream. The kernel is thus transformed to accomplish this.
     77
     78The transformed kernel is composed of several simple layers that we will discuss one at a time here, revealing more information as we go. The first layer handles creating the kernel instance and enqueuing it onto the appropriate stream.
     79{{{
     80void _cuda_K(dim3 gridDim, dim3 blockDim, size_t _cuda_mem_size, cudaStream_t _cuda_stream, args) {
     81  void _cuda_kernel($cuda_kernel_instance_t* _cuda_this, cudaEvent_t _cuda_event) {
     82    ...
     83  }
     84  $cuda_enqueue_kernel(_cuda_stream, _cuda_kernel);
     85}
     86}}}
     87For clarity, we will refer to `_cuda_kernel` as the ''inner kernel'' of our original kernel `K`.
     88
     89`$cuda_enqueue_kernel` does the following:
     901. Creates a new `cudaEvent_t` called `e` based on the stream being used. (see below for further details)
     912. Creates a new `$cuda_kernel_instance_t` and enqueues it onto the stream.
     923. Spawns the inner kernel as a new process, passing in the `$cuda_kernel_instance_t` created in step 2 and the `cudaEvent_t` created in step 3 as its parameters.
     934. Sets the `process` field of the `$cuda_kernel_instance_t` from step 2 to be the spawned process from step 3.
     94
     95Recall that a CUDA kernel in a non-null stream, call it `s`, must wait for all other kernels that were enqueued in `s` or the null stream at the time that the kernel was launched. Additionally, any kernel launched on the null stream must wait for all kernels enqueued in any stream at the time of launch. A `cudaEvent_t` serves as a structure that is meant to store some set of kernels that we can wait on.
     96{{{
     97typedef struct _CUevent cudaEvent_t;
     98struct _CUevent{
     99  $cuda_kernel_instance_t** instances;
     100  int numInstances;
     101};
     102}}}
     103Therefore, when we create the `cudaEvent_t` in step 1, we are simply grabbing the most recent kernel from the streams that we want to wait on, and storing it in this new event. We then pass this event to the inner kernel so that the inner kernel can wait on these other kernels before actually acting running itself. This can be seen in the next layer of our transformed kernel `K`:
     104{{{
     105void _cuda_K(dim3 gridDim, dim3 blockDim, size_t _cuda_mem_size, cudaStream_t _cuda_stream, args) {
     106  void _cuda_kernel($cuda_kernel_instance_t* _cuda_this, cudaEvent_t _cuda_event) {
     107    void _cuda_block(uint3 blockIdx) {
     108      ...
     109    }
     110    $cuda_wait_in_queue(_cuda_this, _cuda_event);
     111    $cuda_run_procs(gridDim, _cuda_block);
     112    $cuda_kernel_finish(_cuda_this);
     113  }
     114  $cuda_enqueue_kernel(_cuda_stream, _cuda_kernel);
     115}
     116}}}