Changes between Version 2 and Version 3 of Notes_on_CUDA_Semantics


Ignore:
Timestamp:
06/06/14 10:42:10 (12 years ago)
Author:
andrevm
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Notes_on_CUDA_Semantics

    v2 v3  
    11
     2== Cuda Programming Model ==
     3
     4More information at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#programming-model
     5
     6=== Kernels ===
     7
     8Declaration Syntax:
     9{{{
     10__global__ void kernel_name(formals) {
     11...
     12}
     13}}}
     14
     15Call Syntax:
     16{{{
     17kernel_name<<<GridDim, BlockDim, BlockHeapSize, Stream>>> (actuals);
     18}}}
     19
     20The <<<...>>> is called the Execution Configuration (http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#execution-configuration)
     21
     22* `GridDim` - a dim3 (or int) specifying the dimensions of the grid in units of # of blocks
     23* `BlockDim` - a dim3 (or int) specifying the dimensions of each block in units of # of threads
     24* `BlockHeapSize`(optional) - dynamically allocated memory for each block (default: 0)
     25* `Stream` - the Cuda stream on which to enqueue this kernel (default: 0/Null stream)
     26
     27Maximum block size is 1024 threads on current GPUs.
     28
     29
     30More information at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels
     31
     32
     33=== Thread Hierarchy ===
     34
     35Inside a kernel, each thread has access to two vectors containing thread id information.
     36
     37* `threadIdx` - 3 component vector specifying the index of the executing thread inside its containing block. threadIdx.d is in the range [0, BlockDim.d - 1] where d is the coordinate of the vector you wish to use (d belongs to the set {x, y, z})
     38* `blockIdx` - 3 component vector specifying the index of the containing block of the executing thread. blockIdx.d is in range [0, GridDim.d - 1]
     39* `blockDim` - 3 component vector equal to the `BlockDim` passed to the kernel as part of the execution configuration.
     40
     41The total number of kernel executions will be equal to `GridDim.x * GridDim.y * GridDim.z * BlockDim.x * BlockDim.y * BlockDim.z`.
     42
     43Thread blocks must be able to execute independently. Threads in the same block can be synchronized using __syncthreads (acts as a barrier) and can communicate using shared memory.
     44
     45More information at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy
     46
     47=== Memory Hierarchy ===
     48
     49Cuda threads have access to a number of different memory spaces while executing.
     50
     51* Private - memory accessible only by a single thread
     52* Shared - memory accessible by all threads in a block
     53* Global - memory accessible by all threads in all blocks, optimized for general purpose usage
     54  * Constant - read-only global memory
     55  * Texture - read-only global memory optimized for certain access patterns and equipped with special access capabilities
     56
     57All globally accessible memory is persistent across multiple kernel invocations by the same program.
     58
     59More information at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-hierarchy
     60
     61=== Heterogenous Programming ===
     62
     63Cuda threads are assumed to be executed on a device separated from the CPU (e.g. GPU) so the CPU (host) and the GPU (device) have separated memory spaces, called host memory and device memory respectively. Because the host cannot directly access memory on the device, memory management must be performed with calls to the Cuda Runtime.
     64
     65More information at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#heterogeneous-programming
    266
    367== Cuda Translation Rules ==
    468
    5 Example Lists
    6 * `code`
    7 * non-code
    8   * nested element
    9 
    10 == Header ==
    11 
    12 === Subheader ===
    13 
    14 {{{
    15   example_code_block();
    16 }}}
     69In progress