== Supported Features ==

* CUDA kernels with the `__global__` specifier
* The dim3 struct type
* Use of the CUDA variables `threadIdx`, `blockIdx`, `gridDim`, and `blockDim`
* `__syncthreads`
* `__shared__`
* Enqueuing multiple kernel calls with the null stream
* `cudaMalloc`
* `cudaMemcpy`
* `cudaFree`
* `cudaDeviceSynchronize`

== Major Limitations ==

* Lacks support for other stream types

=== Missing Features ===

* Use of the `warpSize` variable
* Atomic functions (e.g. `atomicAdd`)