== Supported Features == * CUDA kernels with the `__global__` specifier * The dim3 struct type * Use of the CUDA variables `threadIdx`, `blockIdx`, `gridDim`, and `blockDim` * `__syncthreads` * `__shared__` * Enqueuing multiple kernel calls with the null stream * `cudaMalloc` * `cudaMemcpy` * `cudaFree` * `cudaDeviceSynchronize` == Major Limitations == * Lacks support for other stream types === Missing Features === * Use of the `warpSize` variable * Atomic functions (e.g. `atomicAdd`)