= CUDA Overview =
== Introduction ==
This page describes how we translate CUDA programs into CIVL-C code. Primarily, we focus on how the cuda-civl library is organized and is used in our final translation of a CUDA program. We assume basic knowledge of CUDA concepts such as steams, kernels, blocks, and threads.

== The CUDA Context and CUDA Streams ==
cuda-civl provides a structure called `$cuda_context_t` which is meant to house all CUDA information that pertains globally to a CUDA program. As such, our translation only creates one instance of this structure as a global variable simply called `$cuda_current_context`. Currently, the only information that `$cuda_context_t` manages is the set of CUDA streams being used in the program (including the null stream which is present in every program).
{{{
typedef struct $cuda_context $cuda_context_t;
struct $cuda_context {
  $cuda_stream_node_t* headNode;
  cudaStream_t nullStream;
  int numStreams;
};
}}}
`$cuda_stream_node_t` is simply a structure which holds a `cudaStream_t` and a pointer to another `$cuda_stream_node_t`. In other words it is a linked list of `cudaStream_t`'s. In general, we use the pattern in which types of the form `<T>_node_t` are structures representing nodes of a linked list containing type `T`. The streams in this list are the "non-default" or "non-null" CUDA streams meant for asynchronous execution of kernels. The integer `numStreams` represents the size of this list. `nullStream` obviously holds the null stream which is used by default when executing kernels. Thus, the number of total streams at any given time of the program is 
`$cuda_current_context.numStreams + 1`.

`cudaStream_t` is an actual struct in CUDA code, however cuda-civl gives its own definition of it so that we can actually use it in CIVL code and analyze it. Because a CUDA stream is essentially just a queue of kernels, our definition of `cudaStream_t` is very simple:
{{{
typedef struct _CUstream _CUstream;
typedef _CUstream* cudaStream_t;
struct _CUstream {
  $cuda_kernel_instance_node_t* mostRecent;
  _Bool usable;
};
}}}
Here `mostRecent` is the front of the queue, which is implemented as a list of `$cuda_kernel_instance_t`'s. `usable` is a boolean meant to signal whether kernels can be enqueued onto it (when `usable == true`) or whether the stream can be destroyed (when `usable == false`). The purpose of `cudaStream_t` being a pointer is a bit unclear to me as of writing this. One thing to note about the use of a pointer here is that in the places in which `cudaStream_t` is used in the cuda-civl library, there is logic to basically interpret the null pointer as the null stream.

We will discuss everything involved in the data type `$cuda_kernel_instance_t` later ('''ADD SECTION REF HERE'''). Before that, we shall discuss how the streams held by `$cuda_current_context` are initialized, managed, and destroyed.

Before executing any CUDA functions, we need to initialize our `$cuda_current_context`. Additionally, after we are done executing all CUDA functions, we need to destroy all the kernels and streams that were created over the lifetime of the program. The way we do this is by renaming the original `main()` function to `_civl_main()` and then creating a new `main()` function that looks like this:
{{{
int main() {
  $cuda_init();
  _civl_main();
  $cuda_finalize();
}
}}}
Here `$cuda_init()` simply creates the null stream of `$cuda_current_context` with the function `$cuda_stream_create()`. `$cuda_finalize()` waits for all kernels in all of its streams to finish completion, and then frees all memory relating to kernels and streams.