| Version 43 (modified by , 12 years ago) ( diff ) |
|---|
OpenMP Primitives
Constructs
parallel- worksharing
forsectionsandsectionsingle
- synchronization
barriercriticalatomicorderedmaster
threadprivateflush
Clauses
private(list)firstprivate(list)lastprivate(list)copyin(list)shared(list)default(none|shared)num_threads(n)collapse(n)schedule(static, n)schedule(dynamic, n)- ...
orderednowaitreduce
Functions
omp_get_num_threads()omp_get_thread_num()
Support Types
$omp_gteam: global team object, represents a team of threads executing in a parallel region. A handle type.$omp_team: local object belonging to a single thread and referencing the global team object. A handle type.$omp_gshared: global shared object, used to represent the state of a shared variable. A handle type.$omp_shared: local view of the shared object, belonging to a single thread, with reference to the global object. A handle type$omp_ref: a reference to a shared object or some part of a shared object
Support Functions
Team creation and destruction
$omp_gteam $omp_gteam_create($scope scope, int nthreads)- creates new global team object, allocating object in heap in specified scope. Number of threads that will be in the team is
nthreads.
- creates new global team object, allocating object in heap in specified scope. Number of threads that will be in the team is
void $omp_gteam_destroy($omp_gteam gteam)- destroys the global team object. All shared objects associated to the team must have been destroyed before calling this function.
$omp_team $omp_team_create($scope scope, $omp_gteam gteam, int tid)- creates new local team object for a specific thread.
void $omp_team_destroy($omp_team team)- destroys the local team object
Shared variables
$omp_gshared $omp_gshared_create($omp_gteam, void *original)- creates new global shared object, associated to the given global team. A pointer to the shared variable that this object corresponds to is given. The new object is initialized by copying the values from the original variable.
void $omp_gshared_destroy($omp_gshared gshared)- destroys the global shared object, copying the context to the original variable
$omp_shared $omp_shared_create($omp_team team, $omp_gshared gshared)- creates a local shared object, returning handle to it
void $omp_shared_destroy($omp_shared shared)- destroys the local shared object
$omp_ref $omp_identity_ref($omp_shared shared)- creates a reference to the specified shared object
$omp_ref $omp_array_element_ref($omp_ref ref, int index)- creates a reference to an element of a shared array object. Argument
refis a reference to shared object of array type.
- creates a reference to an element of a shared array object. Argument
$omp_ref $omp_member_ref($omp_ref ref, int fieldIndex)- creates a reference to a member of a structure or union shared object. Argument
refis a reference to the shared object of structure or union type.
- creates a reference to a member of a structure or union shared object. Argument
void $omp_read(void *result, $omp_ref ref)- called by a thread to read a shared object pointed to by
ref. The result of the read is stored in the memory unit pointed to byresult.
- called by a thread to read a shared object pointed to by
void $omp_write($shared_ref ref, void *value)- called by a thread to write to the shared object pointed to by
ref. The value to be written is taken from the memory unit pointed to byvalue.
- called by a thread to write to the shared object pointed to by
void $omp_apply_assoc($shared_ref ref, $operator op, void *value)- Reads the reference, applies the associative operator specified by
opto the read value and the value pointed to byvalue, and writes the result back toref. This happens in one atomic step. Example: you can use this to add some value to a shared variable, usingCIVL_SUMforop.
- Reads the reference, applies the associative operator specified by
void $omp_flush($omp_shared shared)- performs an OpenMP flush operation on the shared object
void $omp_flush_all($omp_team)- performs an OpenMP flush operation on all shared objects. This is the default in OpenMP if no argument is specified for a flush construct.
Worksharing and barriers
void $omp_barrier($omp_team team)- performs a barrier only. Note however that usually (always?) a barrier is accompanied by a flush-all, so
$omp_barrier_and_flushshould be used instead.
- performs a barrier only. Note however that usually (always?) a barrier is accompanied by a flush-all, so
void $omp_barrier_and_flush($omp_team team)- combines a barrier and a flush on all shared objects owned by the team. Implicit in many OpenMP worksharing constructs.
$domain $omp_arrive_loop($omp_team, $domain loop_dom)- called by a thread when it reaches an omp for loop, this function returns the subset of the loop domain specifying the iterations that this thread will execute.
$domain $omp_arrive_sections($omp_team, int numSections)- called by a thread when it reaches an omp sections construct, this function returns the subset of the integers 0..numSections-1 specifying the indexes of the sections that this thread will execute. The sections are numbered from 0 in increasing order.
int $omp_arrive_single($omp_team team)- called by a thread when it reaches on omp single construct, returns the thread ID of the thread that will execute the single construct.
Memory model
This section describes how the memory model is modeled. These protocols are used in the implementations of the system function dealing with shared objects.
For each shared variable v introduce a second variable v_state. The type of v_state is obtained from the type of v by replacing all primitive types (leaf nodes in the type tree) by int. Initially all these ints are -1. Both variables are declared in the same, shared, scope.
In addition to the shared variable v, each thread has its own local copy named _v, declared in thread private scope. It has the same type as v.
Protocols for reads, writes, and flushes:
A write to (some part of) the shared variable by thread tid:
- if the state value is -1, do the write to the local copy and set the state value to tid. Now thread tid is the "owner" of that memory unit.
- if the state value is tid, do the write to the local copy.
- else report a memory model error: you are attempting to write to a variable when some other thread has un-flushed writes to the same variable. The other thread should flush, then you should flush, before doing this write.
A read from (some part of) the shared variable by thread tid:
- if the state value is -1, read your local copy and compare it to the global copy. If they differ, report a memory model error: some other thread has modified the variable and flushed, but you did not flush before performing the read. (If you had flushed, your local copy and the shared copy would be equal.)
- if the state value is tid, read your local copy.
- else report a memory model error: you are reading from a variable when another thread has un-flushed writes to that variable. The other thread should flush, and then you should flush, before doing this read.
Translating flushof (some part of) the shared variable by thread tid:
- if the state value is -1: copy the global value to your local copy of the variable.
- if the state value is tid: copy your local value to the global copy of the variable and set the state value to -1.
- else: report a memory model error, since you are doing a flush when some other thread has un-flushed writes to the variable. The other thread should flush first.
The function $omp_barrier_and_flush performs a barrier on the team and a flush on all shared variables. After this completes, all local copies will agree with each other and with the shared copy of the variable, and all state variables will be -1.
Worksharing model
This section describes how the system functions dealing with worksharing are implemented.
The global data structure $omp_gteam contains a FIFO queue for each thread. The queue contains work-sharing records, one record for each work-sharing or barrier construct encountered. The record contains the basic information about the construct as provided by the arguments to the arrival function, as well as the distribution chosen for that thread.
The constructs are a lot like MPI collective operations, and are modeled similarly.
When a thread arrives at one of these constructs, it invokes the relevant arrival function. At this point you can determine whether this thread is the first to arrive at that construct. If its queue is empty, it is the first, otherwise it is not first, and the oldest entry in its queue will be the entry corresponding to this construct.
When a thread is the first thread to arrive at a construct, a distribution is chosen for every thread and a record is created and enqueued in each thread queue (including the caller). The distributions can be chosen nondeterministically, possibly with some restrictions to achieve some tractability/soundness compromise. The record for this thread is then dequeued and the iterator returned.
If a thread is not the first to arrive, its record is dequeued and compared with the arguments given in the function call. They should match, and if they don't, an error is reported. This indicates that either threads encountered constructs in different orders or the loop parameters changed.
Translations of specific directives
Translating parallel
parallel: this spawns some nondeterministic number of threads. We will assume there is a constant THREAD_MAX defined somewhere. The number of threads created will be between 1 and THREAD_MAX (inclusive). Each thread is assigned an ID. The original ("master") thread has ID 0. All threads execute the parallel region.
float x; // shared
int y; // private
#pragma omp parallel shared(x) private(y)
{
...
x=5.2;
y=3;
...
}
=>
float x;
int y;
{ // begin parallel construct
int _nthreads = 1+$choose_int(THREAD_MAX);
$omp_gteam gteam = $omp_gteam_create($here, nthreads);
$omp_gshared x_gshared = $omp_gshared_create(gteam, &x);
$parfor (int _tid : {0..nthreads-1}) {
$omp_team team = $omp_team_create($here, gteam, _tid);
$omp_shared x_shared = $omp_shared_create(team, x_gshared);
int _y; // private variable
...
{ // "x=5.2":
float tmp = 5.2;
$omp_write($omp_identity_ref(x_shared), &tmp);
}
_y = 3;
...
$omp_barrier_and_flush(team); // implicit at end of parallel region
$omp_shared_destroy(x_shared);
$omp_team_destroy(team);
} // end $parfor
$omp_gshared_destroy(x_gshared);
$omp_gteam_destroy(gteam);
} // end parallel construct
All variables that occur in the parallel construct, i.e., the lexical extent of the parallel construct, must be determined to be either private or shared. This is determined by the clauses and the default rules as specified in the OpenMP Standard. Obviously any variable declared within the construct itself must be private.
For all private variables y not declared within the parallel construct, create a new variable of the same type, _y. The new variable is declared within the thread scope. If y is also firstprivate, then _y is initialized with the value of y, e.g. int _y=y;. Otherwise, _y is uninitialized, so has an undefined value.
Translating for
Try to determine whether the loop iterations are independent. In that case, they can all be executed by one thread. Otherwise:
#pragma omp parallel for for (i=0; i<n; i++) S
=>
{
$domain loop_domain = {0..n-1};
$domain my_iters = $omp_arrive_loop(team, loop_domain);
$for (int i : my_iters) {
translate(S);
}
$barrier_and_flush(team);
}
We can vary the way the sub-domains are chosen to explore different tradeoffs and strategies. On one extreme, every kind of partition can be explored; on the other, some fixed strategy like round-robin with chunksize 1 can be used. This only changes the definition of $omp_arrive_loop, not the translation above.
#pragma omp parallel for collapse(3)
for (i=0; i<n; i++)
for (j=0; j<m; j++)
for (k=0; k<l; k++) {
S
}
=>
{
$domain loop_domain = {0..n-1, 0..m-1, 0..l-1};
$domain my_iters = $omp_arrive_loop(team, loop_domain);
$for (int i, j, k : my_iters) {
translate(S);
}
$barrier_and_flush(team);
}
Translating reduction clause
#pragma omp for reduction(+:x,y)
for (i=a; i<b; i++) {
S
}
=>
{
$domain loop_domain = {a..b-1};
$domain my_iters = $omp_arrive_loop(team, loop_domain);
double _x=0.0, _y=0.0;
$for (int i : my_iters) {
translate(S) but replace x with _x and y with _y;
}
$omp_apply_assoc(x_shared, CIVL_SUM, &_x);
$omp_apply_assoc(y_shared, CIVL_SUM, &_y);
$omp_barrier_and_flush(team);
}
Translating sections
Say there are numSections sections. This number is known statically.
#pragma omp sections #pragma omp section S0 #pragma omp section S1 ...
=>
{
$domain my_secs = $omp_arrive_sections(team, numSections);
for (int i : my_secs) {
switch (i) {
case 0: {
translate(S0);
break;
}
case 1: {
translate(S1);
break;
}
...
} /* end of switch */
} /* end of $for loop */
$omp_barrier_and_flush();
}
Translating single
#pragma omp single S
=>
int owner = $omp_arrive_single(team);
if (owner == _tid) {
translate(S);
}
$omp_barrier_and_flush(team);
Translating barrier
#pragma omp barrier
=>
$omp_barrier_and_flush(team);
Translating critical
Basically, use a lock for each critical name, plus one for the "no name". All threads must obtain lock to enter the critical section, then release it.
I.e., if there are critical sections name a, b, and c, there should be global root-scope variables of boolean type named _critical_noname, _critical_a, etc.
#pragma omp critical a S
=>
... _Bool _critical_a = $false; . . . $when (!_critical_a) _critical_a=$true; translate(S); _critical_a=$false;
Translating atomic
In general, reads and writes to shared variables will be processed using the protocols described above. However if the operation occurs within an omp atomic construct, it is translated differently.
TODO: need to look up the rules on the different flavors of atomics.
If sequentially consistent atomic...
If non-sequentially consistent atomic...
Translatingordered
This can only be used inside and OMP for loop in which the pragma used the ordered clause. (Check that.) It indicates that the specified region must be executed in iteration order.
In this case the system function must return an int iterator in which the ints occur in loop order.
#pragma omp for ordered
for (i=a; i<b; i++) {
...
#pragma omp ordered
S1
...
#pragma omp ordered
S2
...
}
=>
{
$domain loop_domain = {a..b};
$domain my_iters = $omp_arrive_loop(loop_domain);
int order1=a, order2=a;
for (int i : my_iters) {
...
$when (order1==i) {
translate(S1);
order1++;
}
...
$when (order2==i) {
translate(S2);
order2++;
}
...
}
}
Translating master
#pragma omp master S
=>
if (_tid == 0) {
translate(S);
}
Translating functions
omp_get_num_threads()=>_nthreadsomp_get_thread_num()=>_tid
