Changes between Version 3 and Version 4 of Implementation_of_CUDA_in_CIVL


Ignore:
Timestamp:
03/16/22 19:45:20 (4 years ago)
Author:
Alex Wilton
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Implementation_of_CUDA_in_CIVL

    v3 v4  
    44
    55== The CUDA Context and CUDA Streams ==
     6=== Structural Organization ===
    67cuda-civl provides a structure called `$cuda_context_t` which is meant to house all CUDA information that pertains globally to a CUDA program. As such, our translation only creates one instance of this structure as a global variable simply called `$cuda_current_context`. Currently, the only information that `$cuda_context_t` manages is the set of CUDA streams being used in the program (including the null stream which is present in every program).
    78{{{
     
    2728Here `mostRecent` is the front of the queue, which is implemented as a list of `$cuda_kernel_instance_t`'s. `usable` is a boolean meant to signal whether kernels can be enqueued onto it (when `usable == true`) or whether the stream can be destroyed (when `usable == false`). The purpose of `cudaStream_t` being a pointer is a bit unclear to me as of writing this. One thing to note about the use of a pointer here is that in the places in which `cudaStream_t` is used in the cuda-civl library, there is logic to basically interpret the null pointer as the null stream.
    2829
    29 We will discuss everything involved in the data type `$cuda_kernel_instance_t` later ('''ADD SECTION REF HERE'''). Before that, we shall discuss how the streams held by `$cuda_current_context` are initialized, managed, and destroyed.
     30The last component of these structures is the `$cuda_kernel_instance_t` datatype. It is a simple bundling of a $proc which holds a reference to the spawned kernel (details of this are discussed later in section '''REF HERE''') along with an enumerator that states whether the kernel is waiting (for the completion of other kernels in its stream), running, or finished. Here is the definition of the struct along with the definition of the enum it uses for its status:
     31{{{
     32typdef enum $cuda_kernel_status {
     33  $cuda_kernel_status_waiting,
     34  $cuda_kernel_status_running,
     35  $cuda_kernel_status_finished
     36} $cuda_kernel_status;
     37
     38typedef struct $cuda_kernel_instance $cuda_kernel_instance_t;
     39struct $cuda_kernel_instance {
     40  $proc process;
     41  $cuda_kernel_status status;
     42};
     43}}}
     44
     45Because a kernel is simply a $proc with a status, the only things one can really do with a kernel is check its status or wait for the proc to finish. Therefore, the entire purpose of all of the structures discussed above is to logically organize the processes created (from CUDA kernels) in a centralized, globally accessible place so that we can easily check the status of these kernels or select subsets of these kernels to wait on.
     46
     47=== Management of CUDA Streams and Kernels ===
    3048
    3149Before executing any CUDA functions, we need to initialize our `$cuda_current_context`. Additionally, after we are done executing all CUDA functions, we need to destroy all the kernels and streams that were created over the lifetime of the program. The way we do this is by renaming the original `main()` function to `_civl_main()` and then creating a new `main()` function that looks like this:
     
    3957`$cuda_finalize()` waits for all kernels in all of its streams to finish completion, and then frees all memory relating to kernels and streams. `$cuda_init()` creates the null stream of `$cuda_current_context` with the function `$cuda_stream_create()`. The method `$cuda_stream_create()` mallocs a new stream, sets `usable` to true, and then adds a dummy `$cuda_kernel_instance_t` to the list with a finished status. The purpose of this dummy kernel is to simplify the logic in our library so that we don't need to check for an empty list each time we use a stream.
    4058
    41 Outside of this `main()` function, new streams are created with the CUDA library function `cudaStreamCreate(cudaStream_t* pStream)`. Our implementation of this function also calls `$cuda_stream_create()`, but it then adds this to the non-default stream list of `$cuda_current_context` and returns `cudaSuccess`.
     59In standard CUDA code, the way one creates new streams are with the CUDA library function `cudaStreamCreate(cudaStream_t* pStream)`. Our implementation of this function also calls `$cuda_stream_create()`, but it then adds this to the non-default stream list of `$cuda_current_context` and returns `cudaSuccess`.
    4260
    43 There are two primary actions one can do with a stream. One can wait on a stream (that is, wait until all kernels of the stream are finished) with the function `$cuda_stream_wait(cudaStream_t)`, or one can add a kernel instance to the stream with the function `$cuda_enqueue_kernel(cudaStream_t, (void ($cuda_kernel_instance_t*, cudaEvent_t)))`, however to explain the details of these functions we now need to learn more about how we emulate cuda kernels in CIVL.
     61There are two primary actions one can do with a stream. One can wait on a stream (that is, wait until all kernels of the stream are finished) with the function `$cuda_stream_wait(cudaStream_t)`, or one can add a kernel instance to the stream with the function `$cuda_enqueue_kernel(cudaStream_t, (void ($cuda_kernel_instance_t*, cudaEvent_t)))`. However, to explain the details of these functions we now need to learn more about how we emulate cuda kernels in CIVL.
    4462
    4563== CUDA Kernels ==