| Version 63 (modified by , 11 years ago) ( diff ) |
|---|
OpenMP Primitives
Constructs
parallel- worksharing
forsectionsandsectionsingle
- synchronization
barriercriticalatomicorderedmaster
threadprivateflush
Clauses
private(list)firstprivate(list)lastprivate(list)copyin(list)shared(list)default(none|shared)num_threads(n)collapse(n)schedule(static, n)schedule(dynamic, n)- ...
orderednowaitreduce
Functions
omp_get_num_threads()omp_get_thread_num()
Support Types
$omp_gteam: global team object, represents a team of threads executing in a parallel region. A handle type. This is where all the state needed to correctly execute a parallel region will be stored. This includes a global barrier and a worksharing queue (incomplete array-of-$omp_work_record) for every thread. Definition:typedef struct OMP_gteam { $scope scope; int nthreads; _Bool init[]; $omp_work_record work[][]; $omp_gshared shared[]; $gbarrier gbarrier; } * $omp_gteam;
$omp_team: local object belonging to a single thread and referencing the global team object. A handle type. It also includes the local views of all shared data and a local barrier. Definition:typedef struct OMP_team { $omp_gteam gteam; $scope scope; int tid; $omp_shared shared[]; $barrier barrier; } * $omp_team;
$omp_gshared: global shared object, containing a reference to a shared variable. A handle type. Definition:typedef struct OMP_gshared { _Bool init[]; void * original; } * $omp_gshared;
$omp_shared: local view of a shared object, belonging to a single thread, with reference to the global object, and a local copy and a status of the shared object. The type of the status variable is obtained from the type of the original variable by replacing all leaf nodes in the type tree with "int". A handle type.typedef struct OMP_shared { $omp_gshared gshared; int tid; void * local; void * status; } * $omp_shared;
$omp_work_record: the worksharing information that a thread needs for executing a worksharing region. It contains the kind of the worksharing region, the location of the region, the status of the region and the subdomain (iterations/sections/task assigned to the thread).typedef struct OMP_work_record { int kind; // loop, barrier, sections, or single int location; // location in model of construct _Bool arrived; // has this thread arrived yet? $domain loop_dom;// full loop domain; null if not loop $domain subdomain; // tasks this thread must do } $omp_work_record;
$omp_var_status: an enumeration type for the status of a shared component. Available enumerators are:EMPTY,FULL,MODIFIED.
Support Functions
Team creation and destruction
$omp_gteam $omp_gteam_create($scope scope, int nthreads)- creates new global team object, allocating object in heap in the specified scope. Number of threads that will be in the team is
nthreads.
- creates new global team object, allocating object in heap in the specified scope. Number of threads that will be in the team is
void $omp_gteam_destroy($omp_gteam gteam)- destroys the global team object. All shared objects associated to the team must have been destroyed before calling this function.
$omp_team $omp_team_create($scope scope, $omp_gteam gteam, int tid)- creates new local team object for a specific thread.
void $omp_team_destroy($omp_team team)- destroys the local team object
Shared variables
Note: none of those variables that comprise a shared object should ever be accessed directly. All access must happen through $omp_read/write, including the local views, status, and shared view.
$omp_gshared $omp_gshared_create($omp_gteam, void *original)- creates new global shared object, associated to the given global team. A pointer to the shared variable that this object corresponds to is given.
void $omp_gshared_destroy($omp_gshared gshared)- destroys the global shared object, copying the context to the original variable
$omp_shared $omp_shared_create($omp_team team, $omp_gshared gshared, void *local, void *status)- creates a local shared object, returning handle to it. The local copy of the shared object is initialised by copying the values from the original variable referenced to by the gshared object. The status variable is initialized to
FULL. The created shared object is appended to the shared queue of the $omp_team object.
- creates a local shared object, returning handle to it. The local copy of the shared object is initialised by copying the values from the original variable referenced to by the gshared object. The status variable is initialized to
- creates a local shared object, returning handle to it. creates a local shared object, returning handle to it. The local copy of the shared object is initialized by copying the values from the original variable referenced to by the gshared object. The created shared object is appended to the shared queue of the $omp_team object.
void $omp_shared_destroy($omp_shared shared)- destroys the local shared object
void $omp_read($omp_shared shared, void *result, void *ref)- called by a thread to read a shared object. ref is a pointer into the local copy of the shared variable. The result of the read is stored in the memory unit pointed to by result. assumes ref is a pointer to a scalar.
void $omp_write($omp_shared shared, void *ref, void *value)- called by a thread to write to the shared object. ref is a pointer into the local copy of the shared variable. The value to be written is taken from the memory unit pointed to by value. assumes ref is a pointer to a scalar.
void $omp_apply_assoc($omp_shared shared, $operation op, void *local)- applies the associative operator specified by op to the local copy and the corresponding shared copy, and writes the result back to the shared copy. This happens in one atomic step. Example: you can use this to add some value to a shared variable, using CIVL_SUM for op. assumes local is a pointer to a scalar.
void $omp_flush($omp_shared shared, void *ref)- performs an OpenMP flush operation on the shared object
void $omp_flush_all($omp_team)- performs an OpenMP flush operation on all shared objects. This is the default in OpenMP if no argument is specified for a flush construct.
Worksharing and barriers
void $omp_barrier($omp_team team)- performs a barrier only. Note however that usually (always?) a barrier is accompanied by a flush-all, so
$omp_barrier_and_flushshould be used instead.
- performs a barrier only. Note however that usually (always?) a barrier is accompanied by a flush-all, so
void $omp_barrier_and_flush($omp_team team)- combines a barrier and a flush on all shared objects owned by the team. Implicit in many OpenMP worksharing constructs.
$domain $omp_arrive_loop($omp_team team, int location, $domain loop_dom, $DecompositionStrategy strategy)- called by a thread when it reaches an omp for loop, this function returns the subset of the loop domain specifying the iterations that this thread will execute. The dimension of the domain returned equals the dimension of the given domain
loop_dom.
- called by a thread when it reaches an omp for loop, this function returns the subset of the loop domain specifying the iterations that this thread will execute. The dimension of the domain returned equals the dimension of the given domain
$domain(1) $omp_arrive_sections($omp_team team, int location, int numSections)- called by a thread when it reaches an omp sections construct, this function returns the subset of the integers 0..numSections-1 specifying the indexes of the sections that this thread will execute. The sections are numbered from 0 in increasing order.
int $omp_arrive_single($omp_team team, int location)- called by a thread when it reaches on omp single construct, returns the thread ID of the thread that will execute the single construct.
Memory model
This section describes how the memory model is modeled. These protocols are used in the implementations of the system function dealing with shared objects.
For each shared variable v, a thread has a local copy of '_v' and a local status variablev_state (both are implemented as part of $omp_shared). _v has the same type as v. The type of v_state is obtained from the type of v by replacing all primitive types (leaf nodes in the type tree) by int. Initially all these ints are -1. Both variables are declared in thread private scope.
For example, given a shared variable a, there will be:
shared copy of variable:
double a[N]; // declared in shared scope
local copy of variable:
double a_local[N]; // declared in thread-local scope
local status variable:
int a_status[N]; // declared in thread-local scope
interepration of status value:
0=EMPTY: local is empty
1=FULL: local is occupied, no writes to it have been made
2=MODIFIED: local is occupied, writes have been made to it
Initially: local=shared, status=FULL
Protocols for reads, writes, and flushes:
A read from (some part of) the shared variable by thread tid:
- if the status value is
EMPTY, copy the shared data into the local copy; - read and return the data held by the local copy.
read (ptr into a_local[i]): if (status is EMPTY) { copy a[1] to a_local[i]; set a_status[i] to FULL; } read a_local[i];
A write to (some part of) the shared variable by thread tid:
- write the data to the local copy;
- update the status to
MODIFIED.write (ptr into a_local[i]): write to a_local[i]; set a_status[i] to MODIFIED;
Translating flushof (some part of) the shared variable by thread tid:
- if the status value is
EMPTY: no op; - if the status value is
FULL: updates the status toEMPTY, sets local copy to default value; - if the status value is
MODIFIED: copies the local copy to the shared copy, updates the status toEMPTY, sets local copy to default value.flush (some set of memory units): for each memory unit specified: switch (status) { case EMPTY: break; // nothing to do case MODIFIED: copy local to shared; case FULL: set status to EMPTY; set local to default value; break; }
The function $omp_barrier_and_flush performs a barrier on the team and a flush on all shared variables. After this completes, all local copies will agree with each other and with the shared copy of the variable, and all state variables will be -1.
barrier_and_flush(): // collective operation on all shared objects
barrier();
for each shared memory unit:
assert there is at most one thread for which this memory unit
has status MODIFIED;
flush(memory unit);
barrier();
Worksharing model
This section describes how the system functions dealing with worksharing are implemented.
The global data structure $omp_gteam contains a FIFO queue for each thread. The queue contains work-sharing records, one record for each work-sharing or barrier construct encountered. The record contains the basic information about the construct as provided by the arguments to the arrival function, as well as the distribution chosen for that thread.
The constructs are a lot like MPI collective operations, and are modeled similarly.
When a thread arrives at one of these constructs, it invokes the relevant arrival function. At this point you can determine whether this thread is the first to arrive at that construct. If its queue is empty, it is the first, otherwise it is not first, and the oldest entry in its queue will be the entry corresponding to this construct.
When a thread is the first thread to arrive at a construct, a distribution is chosen for every thread and a record is created and enqueued in each thread queue (including the caller). The distributions can be chosen nondeterministically, possibly with some restrictions to achieve some tractability/soundness compromise. The record for this thread is then dequeued and the iterator returned.
If a thread is not the first to arrive, its record is dequeued and compared with the arguments given in the function call. They should match, and if they don't, an error is reported. This indicates that either threads encountered constructs in different orders or the loop parameters changed.
Translations of specific directives
Translating parallel
parallel: this spawns some nondeterministic number of threads. We will assume there is a constant THREAD_MAX defined somewhere. The number of threads created will be between 1 and THREAD_MAX (inclusive). Each thread is assigned an ID. The original ("master") thread has ID 0. All threads execute the parallel region.
float x; // shared
int y; // private
#pragma omp parallel shared(x) private(y)
{
...
x=5.2;
y=3;
...
}
=>
float x;
int y;
{ // begin parallel construct
int _nthreads = 1+$choose_int(THREAD_MAX);
$omp_gteam gteam = $omp_gteam_create($here, nthreads);
$omp_gshared x_gshared = $omp_gshared_create(gteam, &x);
$parfor (int _tid : {0..nthreads-1}) {
$omp_team team = $omp_team_create($here, gteam, _tid);
$omp_shared x_shared = $omp_shared_create(team, x_gshared);
int _y; // private variable
...
{ // "x=5.2":
float tmp = 5.2;
$omp_write(x_shared, x_shared->local, &tmp);
}
_y = 3;
...
$omp_barrier_and_flush(team); // implicit at end of parallel region
$omp_shared_destroy(x_shared);
$omp_team_destroy(team);
} // end $parfor
$omp_gshared_destroy(x_gshared);
$omp_gteam_destroy(gteam);
} // end parallel construct
All variables that occur in the parallel construct, i.e., the lexical extent of the parallel construct, must be determined to be either private or shared. This is determined by the clauses and the default rules as specified in the OpenMP Standard. Obviously any variable declared within the construct itself must be private.
For all private variables y not declared within the parallel construct, create a new variable of the same type, _y. The new variable is declared within the thread scope. If y is also firstprivate, then _y is initialized with the value of y, e.g. int _y=y;. Otherwise, _y is uninitialized, so has an undefined value.
Translating for
Try to determine whether the loop iterations are independent. In that case, they can all be executed by one thread. Otherwise:
#pragma omp for for (i=0; i<n; i++) S
=>
{
$domain loop_domain = {0..n-1};
$domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY);
$for (int i : my_iters) {
translate(S);
}
$barrier_and_flush(team);
}
We can vary the way the sub-domains are chosen to explore different tradeoffs and strategies. On one extreme, every kind of partition can be explored; on the other, some fixed strategy like round-robin with chunksize 1 can be used. This only changes the definition of $omp_arrive_loop, not the translation above.
#pragma omp parallel for collapse(3)
for (i=0; i<n; i++)
for (j=0; j<m; j++)
for (k=0; k<l; k++) {
S
}
=>
{
$domain loop_domain = {0..n-1, 0..m-1, 0..l-1};
$domain(3) my_iters = ($domain(3))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY);
$for (int i, j, k : my_iters) {
translate(S);
}
$omp_barrier_and_flush(team);
}
Translating reduction clause
#pragma omp for reduction(+:x,y)
for (i=a; i<b; i++) {
S
}
=>
The following translation assumes that $omp_shared has already been created somewhere.
{
$domain loop_domain = {a..b-1};
$domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY);
double _x=0.0, _y=0.0; // not the local view of the shared variable x/y
$for (int i : my_iters) {
translate(S) but replace x with _x and y with _y;
}
$omp_apply_assoc(x_shared, CIVL_SUM, &_x);
$omp_apply_assoc(y_shared, CIVL_SUM, &_y);
$omp_barrier_and_flush(team);
}
Translating sections
Say there are numSections sections. This number is known statically.
#pragma omp sections #pragma omp section S0 #pragma omp section S1 ...
=>
{
$domain(1) my_secs = $omp_arrive_sections(team, SEC_LOC++, numSections);
$for (int i : my_secs) {
switch (i) {
case 0: {
translate(S0);
break;
}
case 1: {
translate(S1);
break;
}
...
} /* end of switch */
} /* end of $for loop */
$omp_barrier_and_flush();
}
Translating single
#pragma omp single S
=>
int owner = $omp_arrive_single(team, SINGLE_LOC++);
if (owner == _tid) {
translate(S);
}
$omp_barrier_and_flush(team);
Translating barrier
#pragma omp barrier
=>
$omp_barrier_and_flush(team);
Translating critical
Basically, use a lock for each critical name, plus one for the "no name". All threads must obtain lock to enter the critical section, then release it.
I.e., if there are critical sections name a, b, and c, there should be global root-scope variables of boolean type named _critical_noname, _critical_a, etc.
#pragma omp critical(a) S
=>
... _Bool _critical_a = $false; . . . $when (!_critical_a) _critical_a=$true; translate(S); _critical_a=$false;
Translating atomic
In general, reads and writes to shared variables will be processed using the protocols described above. However if the operation occurs within an omp atomic construct, it is translated differently.
TODO: need to look up the rules on the different flavors of atomics.
If sequentially consistent atomic...
If non-sequentially consistent atomic...
Translatingordered
This can only be used inside and OMP for loop in which the pragma used the ordered clause. (Check that.) It indicates that the specified region must be executed in iteration order.
In this case the system function must return an int iterator in which the ints occur in loop order.
#pragma omp for ordered
for (i=a; i<b; i++) {
...
#pragma omp ordered
S1
...
#pragma omp ordered
S2
...
}
=>
{
$domain loop_domain = {a..b};
$domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY);
int order1=a, order2=a;
$for (int i : my_iters) {
...
$when (order1==i) {
translate(S1);
order1++;
}
...
$when (order2==i) {
translate(S2);
order2++;
}
...
}
}
Translating master
#pragma omp master S
=>
if (_tid == 0) {
translate(S);
}
Translating nowait
Just leave out the $omp_barrier_and_flush at the end of the translated construct.
Translating Orphan Functions
TODO
Translating functions
omp_get_num_threads()=>_nthreadsomp_get_thread_num()=>_tid
