Changes between Version 20 and Version 21 of Notes_on_CUDA_Semantics
- Timestamp:
- 07/27/22 11:42:28 (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Notes_on_CUDA_Semantics
v20 v21 286 286 * `__shfl_xor_sync` - Each thread included in `mask` converges and returns the value of `var` obtained from a source lane with ID calculated from a bitwise exclusive or done on its own `laneID` and the `laneMask` parameter. If the source lane determined by `laneID ^ laneMask` is within the same sub-warp as the calling thread, or in a sub-warp that has lower `threadID`s within the same warp, the data can be exchanged. However, if the source lane is in a sub-warp that has higher `threadID`s or is out of bounds of the current 32 thread warp, the calling thread's original `var` is returned. 287 287 288 Undefined behavior can be caused by accessing data from a n inactive thread or having threads not specified in `mask` run the intrinsics. The code will not run if an invalid `width` is passed in, threads call the intrinsics with different `mask`s, or a thread included in `mask` does not call the collective.288 Undefined behavior can be caused by accessing data from a thread not participating in the call. The code will not run if an invalid `width` is passed in, threads call the intrinsics with inconsistent `mask`s, or a thread included in `mask` does not call the collective. 289 289 290 290 Mask Parameter - https://stackoverflow.com/questions/58833808/insight-into-the-first-argument-mask-in-shfl-sync 291 291 292 Our emulation of the `_sync` semantics: 293 294 * If a thread t is not in its own mask then it will not synchronize with any other threads 295 * The value returned will be its own value if sourceLane = laneID or sourceLane is outside width in the cases of `up_sync` and `down_sync` 296 * The value returned will be havoced in any other case since there is no guarantee that this thread will synchronize with the requested sourceLane. 297 * If a thread t is in its own mask then it will participate in the barrier 298 * If sourceLane is in t's mask then t requests a message from sourceLane and returns the value obtained. 299 * If sourceLane is not in t's mask then we cannot guarantee that sourceLane will participate with us and so we simply make no request and just return a havoced value at the end. 300 * Regardless of these two cases, t will always check for requests sent to it after the barrier call and fulfill these requests. 301 292 302 293 303
