Changes between Version 20 and Version 21 of Notes_on_CUDA_Semantics


Ignore:
Timestamp:
07/27/22 11:42:28 (4 years ago)
Author:
Alex Wilton
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Notes_on_CUDA_Semantics

    v20 v21  
    286286* `__shfl_xor_sync` - Each thread included in `mask` converges and returns the value of `var` obtained from a source lane with ID calculated from a bitwise exclusive or done on its own `laneID` and the `laneMask` parameter. If the source lane determined by `laneID ^ laneMask` is within the same sub-warp as the calling thread, or in a sub-warp that has lower `threadID`s within the same warp, the data can be exchanged. However, if the source lane is in a sub-warp that has higher `threadID`s or is out of bounds of the current 32 thread warp, the calling thread's original `var` is returned.
    287287
    288 Undefined behavior can be caused by accessing data from an inactive thread or having threads not specified in `mask` run the intrinsics. The code will not run if an invalid `width` is passed in, threads call the intrinsics with different `mask`s, or a thread included in `mask` does not call the collective.
     288Undefined behavior can be caused by accessing data from a thread not participating in the call. The code will not run if an invalid `width` is passed in, threads call the intrinsics with inconsistent `mask`s, or a thread included in `mask` does not call the collective.
    289289
    290290Mask Parameter - https://stackoverflow.com/questions/58833808/insight-into-the-first-argument-mask-in-shfl-sync
    291291
     292Our emulation of the `_sync` semantics:
     293
     294* If a thread t is not in its own mask then it will not synchronize with any other threads
     295  * The value returned will be its own value if sourceLane = laneID or sourceLane is outside width in the cases of `up_sync` and `down_sync`
     296  * The value returned will be havoced in any other case since there is no guarantee that this thread will synchronize with the requested sourceLane.
     297* If a thread t is in its own mask then it will participate in the barrier
     298  * If sourceLane is in t's mask then t requests a message from sourceLane and returns the value obtained.
     299  * If sourceLane is not in t's mask then we cannot guarantee that sourceLane will participate with us and so we simply make no request and just return a havoced value at the end.
     300  * Regardless of these two cases, t will always check for requests sent to it after the barrier call and fulfill these requests.
     301
    292302 
    293303