== OpenMP Primitives == Constructs * `parallel` * worksharing * `for` * `sections` and `section` * `single` * synchronization * `barrier` * `critical` * `atomic` * `ordered` * `master` * `threadprivate` * `flush` Clauses * `private(`list`)` * `firstprivate(list)` * `lastprivate(list)` * `copyin(list)` * `shared(`list`)` * `default(none`|`shared)` * `num_threads(`n`)` * `collapse(n)` * `schedule(static, n)` * `schedule(dynamic, n)` * ... * `ordered` * `nowait` * `reduce` Functions * `omp_get_num_threads()` * `omp_get_thread_num()` == Support Types == * `$omp_gteam`: global team object, represents a team of threads executing in a parallel region. A handle type. This is where all the state needed to correctly execute a parallel region will be stored. This includes a global barrier and a worksharing queue (incomplete array-of-$omp_work_record) for every thread. Definition: {{{ typedef struct OMP_gteam { $scope scope; int nthreads; _Bool init[]; $omp_work_record work[][]; $omp_gshared shared[]; $gbarrier gbarrier; } * $omp_gteam; }}} * `$omp_team`: local object belonging to a single thread and referencing the global team object. A handle type. It also includes the local views of all shared data and a local barrier. Definition: {{{ typedef struct OMP_team { $omp_gteam gteam; $scope scope; int tid; $omp_shared shared[]; $barrier barrier; } * $omp_team; }}} * `$omp_gshared`: global shared object, containing a reference to a shared variable. A handle type. Definition: {{{ typedef struct OMP_gshared { _Bool init[]; void * original; } * $omp_gshared; }}} * `$omp_shared`: local view of a shared object, belonging to a single thread, with reference to the global object, and a local copy and a status of the shared object. The type of the status variable is obtained from the type of the original variable by replacing all leaf nodes in the type tree with "int". A handle type. {{{ typedef struct OMP_shared { $omp_gshared gshared; int tid; void * local; void * status; } * $omp_shared; }}} * `$omp_work_record`: the worksharing information that a thread needs for executing a worksharing region. It contains the kind of the worksharing region, the location of the region, the status of the region and the subdomain (iterations/sections/task assigned to the thread). {{{ typedef struct OMP_work_record { int kind; // loop, barrier, sections, or single int location; // location in model of construct _Bool arrived; // has this thread arrived yet? $domain subdomain; // tasks this thread must do } $omp_work_record; }}} * `$omp_var_status`: an enumeration type for the status of a shared component. Available enumerators are: `EMPTY`, `FULL`, `MODIFIED`. == Support Functions == === Team creation and destruction === * `$omp_gteam $omp_gteam_create($scope scope, int nthreads)` * creates new global team object, allocating object in heap in the specified scope. Number of threads that will be in the team is `nthreads`. * `void $omp_gteam_destroy($omp_gteam gteam)` * destroys the global team object. All shared objects associated to the team must have been destroyed before calling this function. * `$omp_team $omp_team_create($scope scope, $omp_gteam gteam, int tid)` * creates new local team object for a specific thread. * `void $omp_team_destroy($omp_team team)` * destroys the local team object === Shared variables === * `$omp_gshared $omp_gshared_create($omp_gteam, void *original)` * creates new global shared object, associated to the given global team. A pointer to the shared variable that this object corresponds to is given. * `void $omp_gshared_destroy($omp_gshared gshared)` * destroys the global shared object, copying the context to the original variable * `$omp_shared $omp_shared_create($omp_team team, $omp_gshared gshared)` * creates a local shared object, returning handle to it. creates a local shared object, returning handle to it. The local copy of the shared object is initialized by copying the values from the original variable referenced to by the gshared object. The created shared object is appended to the shared queue of the $omp_team object. * `void $omp_shared_destroy($omp_shared shared)` * destroys the local shared object * `void $omp_read($omp_shared shared, void *result, void *ref)` * called by a thread to read a shared object. ref is a pointer into the local copy of the shared variable. The result of the read is stored in the memory unit pointed to by result. assumes ref is a pointer to a scalar. * `void $omp_write($omp_shared shared, void *ref, void *value)` * called by a thread to write to the shared object. ref is a pointer into the local copy of the shared variable. The value to be written is taken from the memory unit pointed to by value. assumes ref is a pointer to a scalar. * `void $omp_apply_assoc($omp_shared shared, void *local, $operation op, void *value)` * applies the associative operator specified by op to the local value and the corresponding shared value, and writes the result back to the shared object. This happens in one atomic step. Example: you can use this to add some value to a shared variable, using CIVL_SUM for op. assumes local is a pointer to a scalar. * `void $omp_flush($omp_shared shared, void *ref)` * performs an OpenMP flush operation on the shared object * `void $omp_flush_all($omp_team)` * performs an OpenMP flush operation on all shared objects. This is the default in OpenMP if no argument is specified for a flush construct. === Worksharing and barriers === * `void $omp_barrier($omp_team team)` * performs a barrier only. Note however that usually (always?) a barrier is accompanied by a flush-all, so `$omp_barrier_and_flush` should be used instead. * `void $omp_barrier_and_flush($omp_team team)` * combines a barrier and a flush on all shared objects owned by the team. Implicit in many OpenMP worksharing constructs. * `$domain $omp_arrive_loop($omp_team, $domain loop_dom)` * called by a thread when it reaches an omp for loop, this function returns the subset of the loop domain specifying the iterations that this thread will execute. The dimension of the domain returned equals the dimension of the given domain `loop_dom`. * `$domain(1) $omp_arrive_sections($omp_team, int numSections)` * called by a thread when it reaches an omp sections construct, this function returns the subset of the integers 0..numSections-1 specifying the indexes of the sections that this thread will execute. The sections are numbered from 0 in increasing order. * `int $omp_arrive_single($omp_team team)` * called by a thread when it reaches on omp single construct, returns the thread ID of the thread that will execute the single construct. == Memory model == This section describes how the memory model is modeled. These protocols are used in the implementations of the system function dealing with shared objects. For each shared variable `v`, a thread has a local copy of '_v' and a local status variable`v_state` (both are implemented as part of $omp_shared). `_v` has the same type as `v`. The type of `v_state` is obtained from the type of `v` by replacing all primitive types (leaf nodes in the type tree) by `int`. Initially all these ints are -1. Both variables are declared in thread private scope. For example, given a shared variable `a`, there will be: shared copy of variable: `double a[N]; // declared in shared scope` local copy of variable: `double a_local[N]; // declared in thread-local scope` local status variable: `int a_status[N]; // declared in thread-local scope` interepration of status value: `0=EMPTY`: local is empty `1=FULL`: local is occupied, no writes to it have been made `2=MODIFIED`: local is occupied, writes have been made to it Initially: `local=shared`, `status=FULL` Protocols for reads, writes, and flushes: A read from (some part of) the shared variable by thread tid: * if the status value is `EMPTY`, copy the shared data into the local copy; * read and return the data held by the local copy. {{{ read (ptr into a_local[i]): if (status is EMPTY) { copy a[1] to a_local[i]; set a_status[i] to FULL; } read a_local[i]; }}} A write to (some part of) the shared variable by thread tid: * write the data to the local copy; * update the status to `MODIFIED`. {{{ write (ptr into a_local[i]): write to a_local[i]; set a_status[i] to MODIFIED; }}} Translating `flush`of (some part of) the shared variable by thread tid: * if the status value is `EMPTY`: no op; * if the status value is `FULL`: updates the status to `EMPTY`, sets local copy to default value; * if the status value is `MODIFIED`: copies the local copy to the shared copy, updates the status to `EMPTY`, sets local copy to default value. {{{ flush (some set of memory units): for each memory unit specified: switch (status) { case EMPTY: break; // nothing to do case MODIFIED: copy local to shared; case FULL: set status to EMPTY; set local to default value; break; } }}} The function `$omp_barrier_and_flush` performs a barrier on the team and a flush on all shared variables. After this completes, all local copies will agree with each other and with the shared copy of the variable, and all state variables will be -1. {{{ barrier_and_flush(): // collective operation on all shared objects barrier(); for each shared memory unit: assert there is at most one thread for which this memory unit has status MODIFIED; flush(memory unit); barrier(); }}} == Worksharing model == This section describes how the system functions dealing with worksharing are implemented. The global data structure `$omp_gteam` contains a FIFO queue for each thread. The queue contains work-sharing records, one record for each work-sharing or barrier construct encountered. The record contains the basic information about the construct as provided by the arguments to the arrival function, as well as the distribution chosen for that thread. The constructs are a lot like MPI collective operations, and are modeled similarly. When a thread arrives at one of these constructs, it invokes the relevant arrival function. At this point you can determine whether this thread is the first to arrive at that construct. If its queue is empty, it is the first, otherwise it is not first, and the oldest entry in its queue will be the entry corresponding to this construct. When a thread is the first thread to arrive at a construct, a distribution is chosen for every thread and a record is created and enqueued in each thread queue (including the caller). The distributions can be chosen nondeterministically, possibly with some restrictions to achieve some tractability/soundness compromise. The record for this thread is then dequeued and the iterator returned. If a thread is not the first to arrive, its record is dequeued and compared with the arguments given in the function call. They should match, and if they don't, an error is reported. This indicates that either threads encountered constructs in different orders or the loop parameters changed. == Translations of specific directives == === Translating `parallel` === `parallel`: this spawns some nondeterministic number of threads. We will assume there is a constant `THREAD_MAX` defined somewhere. The number of threads created will be between 1 and `THREAD_MAX` (inclusive). Each thread is assigned an ID. The original ("master") thread has ID 0. All threads execute the parallel region. {{{ float x; // shared int y; // private #pragma omp parallel shared(x) private(y) { ... x=5.2; y=3; ... } }}} => {{{ float x; int y; { // begin parallel construct int _nthreads = 1+$choose_int(THREAD_MAX); $omp_gteam gteam = $omp_gteam_create($here, nthreads); $omp_gshared x_gshared = $omp_gshared_create(gteam, &x); $parfor (int _tid : {0..nthreads-1}) { $omp_team team = $omp_team_create($here, gteam, _tid); $omp_shared x_shared = $omp_shared_create(team, x_gshared); int _y; // private variable ... { // "x=5.2": float tmp = 5.2; $omp_write(x_shared, &(x_shared->local), &tmp); } _y = 3; ... $omp_barrier_and_flush(team); // implicit at end of parallel region $omp_shared_destroy(x_shared); $omp_team_destroy(team); } // end $parfor $omp_gshared_destroy(x_gshared); $omp_gteam_destroy(gteam); } // end parallel construct }}} All variables that occur in the parallel construct, i.e., the lexical extent of the parallel construct, must be determined to be either private or shared. This is determined by the clauses and the default rules as specified in the OpenMP Standard. Obviously any variable declared within the construct itself must be private. For all private variables `y` not declared within the parallel construct, create a new variable of the same type, `_y`. The new variable is declared within the thread scope. If `y` is also firstprivate, then `_y` is initialized with the value of `y`, e.g. `int _y=y;`. Otherwise, `_y` is uninitialized, so has an undefined value. === Translating `for` === Try to determine whether the loop iterations are independent. In that case, they can all be executed by one thread. Otherwise: {{{ #pragma omp for for (i=0; i {{{ { $domain loop_domain = {0..n-1}; $domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, loop_domain); $for (int i : my_iters) { translate(S); } $barrier_and_flush(team); } }}} We can vary the way the sub-domains are chosen to explore different tradeoffs and strategies. On one extreme, every kind of partition can be explored; on the other, some fixed strategy like round-robin with chunksize 1 can be used. This only changes the definition of `$omp_arrive_loop`, not the translation above. {{{ #pragma omp parallel for collapse(3) for (i=0; i {{{ { $domain loop_domain = {0..n-1, 0..m-1, 0..l-1}; $domain(3) my_iters = ($domain(3))$omp_arrive_loop(team, loop_domain); $for (int i, j, k : my_iters) { translate(S); } $omp_barrier_and_flush(team); } }}} === Translating `reduction` clause === {{{ #pragma omp for reduction(+:x,y) for (i=a; i {{{ { $domain loop_domain = {a..b-1}; $domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, loop_domain); double _x=0.0, _y=0.0; $for (int i : my_iters) { translate(S) but replace x with _x and y with _y; } $omp_apply_assoc(x_shared, CIVL_SUM, &_x); $omp_apply_assoc(y_shared, CIVL_SUM, &_y); $omp_barrier_and_flush(team); } }}} === Translating `sections` === Say there are `numSections` sections. This number is known statically. {{{ #pragma omp sections #pragma omp section S0 #pragma omp section S1 ... }}} => {{{ { $domain(1) my_secs = $omp_arrive_sections(team, numSections); $for (int i : my_secs) { switch (i) { case 0: { translate(S0); break; } case 1: { translate(S1); break; } ... } /* end of switch */ } /* end of $for loop */ $omp_barrier_and_flush(); } }}} === Translating `single` === {{{ #pragma omp single S }}} => {{{ int owner = $omp_arrive_single(team); if (owner == _tid) { translate(S); } $omp_barrier_and_flush(team); }}} === Translating `barrier` === {{{ #pragma omp barrier }}} => {{{ $omp_barrier_and_flush(team); }}} === Translating `critical` === Basically, use a lock for each critical name, plus one for the "no name". All threads must obtain lock to enter the critical section, then release it. I.e., if there are critical sections name a, b, and c, there should be global root-scope variables of boolean type named `_critical_noname`, `_critical_a`, etc. {{{ #pragma omp critical a S }}} => {{{ ... _Bool _critical_a = $false; . . . $when (!_critical_a) _critical_a=$true; translate(S); _critical_a=$false; }}} === Translating `atomic` === In general, reads and writes to shared variables will be processed using the protocols described above. However if the operation occurs within an omp atomic construct, it is translated differently. TODO: need to look up the rules on the different flavors of atomics. If sequentially consistent atomic... If non-sequentially consistent atomic... === Translating`ordered` === This can only be used inside and OMP `for` loop in which the pragma used the `ordered` clause. (Check that.) It indicates that the specified region must be executed in iteration order. In this case the system function must return an int iterator in which the ints occur in loop order. {{{ #pragma omp for ordered for (i=a; i {{{ { $domain loop_domain = {a..b}; $domain(1) my_iters = ($domain(1))$omp_arrive_loop(loop_domain); int order1=a, order2=a; $for (int i : my_iters) { ... $when (order1==i) { translate(S1); order1++; } ... $when (order2==i) { translate(S2); order2++; } ... } } }}} === Translating `master` === {{{ #pragma omp master S }}} => {{{ if (_tid == 0) { translate(S); } }}} === Translating `nowait` === Just leave out the `$omp_barrier_and_flush` at the end of the translated construct. == Translating functions == * `omp_get_num_threads()` => `_nthreads` * `omp_get_thread_num()` => `_tid`