wiki:OpenMPTransformation

Version 37 (modified by siegel, 12 years ago) ( diff )

--

OpenMP Primitives

Constructs

  • parallel
  • worksharing
    • for
    • sections and section
    • single
  • synchronization
    • barrier
    • critical
    • atomic
    • ordered
    • master
  • threadprivate
  • flush

Clauses

  • private(list)
  • firstprivate(list)
  • lastprivate(list)
  • copyin(list)
  • shared(list)
  • default(none|shared)
  • num_threads(n)
  • collapse(n)
  • schedule(static, n)
  • schedule(dynamic, n)
  • ...
  • ordered
  • nowait
  • reduce

Functions

  • omp_get_num_threads()
  • omp_get_thread_num()

Support Types

  • $omp_gteam: global team object, represents a team of threads executing in a parallel region. A handle type.
  • $omp_team: local object belonging to a single thread and referencing the global team object. A handle type.
  • $omp_gshared: global shared object, used to represent the state of a shared variable. A handle type.
  • $omp_shared: local view of the shared object, belonging to a single thread, with reference to the global object. A handle type
  • $omp_ref: a reference to a shared object or some part of a shared object

Support Functions

Team creation and destruction

  • $omp_gteam $omp_gteam_create($scope scope, int threads)
  • void $omp_gteam_destroy($omp_gteam gteam)
  • $omp_team $omp_team_create($scope scope, $omp_gteam gteam, int tid)
  • void $omp_team_destroy($omp_team team)

Shared variables

  • $omp_gshared $omp_gshared_create($omp_gteam, void *original)
  • void $omp_gshared_destroy($omp_gshared gshared)
  • $omp_shared $omp_shared_create($omp_team team, $omp_gshared gshared)
  • void $omp_shared_destroy($omp_shared shared)
  • $omp_ref $omp_identity_ref($omp_shared shared)
  • $omp_ref $omp_array_element_ref($omp_ref ref, int index)
  • $omp_ref $omp_member_ref($omp_ref, int fieldIndex)
  • void $omp_read(void *result, $omp_ref ref)
    • called by a thread to read a shared object pointed to by ref. The result of the read is stored in the memory unit pointed to by result.
  • void $omp_write($shared_ref ref, void *value)
    • called by a thread to write to the shared object pointed to by ref. The value to be written is taken from the memory unit pointed to by value.
  • void $omp_flush($omp_shared shared)
    • performs an OpenMP flush operation on the shared object
  • void $omp_flush_all($omp_team)
    • performs an OpenMP flush operation on all shared objects. This is the default in OpenMP if no argument is specified for a flush construct.

Worksharing and barriers

  • void $omp_barrier($omp_team team)
    • performs a barrier only. Note however that usually (always?) a barrier is accompanied by a flush-all, so $omp_barrier_and_flush should be used instead.
  • void $omp_barrier_and_flush($omp_team team)
    • combines a barrier and a flush on all shared objects owned by the team. Implicit in many OpenMP worksharing constructs.
  • $domain $omp_arrive_loop($omp_team, $domain loop_dom)
    • called by a thread when it reaches an omp for loop, this function returns the subset of the loop domain specifying the iterations that this thread will execute.
  • $domain $omp_arrive_sections($omp_team, int numSections)
    • called by a thread when it reaches an omp sections construct, this function returns the subset of the integers 0..numSections-1 specifying the indexes of the sections that this thread will execute. The sections are numbered from 0 in increasing order.
  • int $omp_arrive_single($omp_team team)
    • called by a thread when it reaches on omp single construct, returns the thread ID of the thread that will execute the single construct.

Memory model

This section describes how the memory model is modeled. These protocols are used in the implementations of the system function dealing with shared objects.

For each shared variable v introduce a second variable v_state. The type of v_state is obtained from the type of v by replacing all primitive types (leaf nodes in the type tree) by int. Initially all these ints are -1. Both variables are declared in the same, shared, scope.

In addition to the shared variable v, each thread has its own local copy named _v, declared in thread private scope. It has the same type as v.

Protocols for reads, writes, and flushes:

A write to (some part of) the shared variable by thread tid:

  • if the state value is -1, do the write to the local copy and set the state value to tid. Now thread tid is the "owner" of that memory unit.
  • if the state value is tid, do the write to the local copy.
  • else report a memory model error: you are attempting to write to a variable when some other thread has un-flushed writes to the same variable. The other thread should flush, then you should flush, before doing this write.

A read from (some part of) the shared variable by thread tid:

  • if the state value is -1, read your local copy and compare it to the global copy. If they differ, report a memory model error: some other thread has modified the variable and flushed, but you did not flush before performing the read. (If you had flushed, your local copy and the shared copy would be equal.)
  • if the state value is tid, read your local copy.
  • else report a memory model error: you are reading from a variable when another thread has un-flushed writes to that variable. The other thread should flush, and then you should flush, before doing this read.

Translating flushof (some part of) the shared variable by thread tid:

  • if the state value is -1: copy the global value to your local copy of the variable.
  • if the state value is tid: copy your local value to the global copy of the variable and set the state value to -1.
  • else: report a memory model error, since you are doing a flush when some other thread has un-flushed writes to the variable. The other thread should flush first.

The function $omp_barrier_and_flush performs a barrier on the team and a flush on all shared variables. After this completes, all local copies will agree with each other and with the shared copy of the variable, and all state variables will be -1.

Worksharing model

This section describes how the system functions dealing with worksharing are implemented.

The global data structure $omp_gteam contains a FIFO queue for each thread. The queue contains work-sharing records, one record for each work-sharing or barrier construct encountered. The record contains the basic information about the construct as provided by the arguments to the arrival function, as well as the distribution chosen for that thread.

The constructs are a lot like MPI collective operations, and are modeled similarly.

When a thread arrives at one of these constructs, it invokes the relevant arrival function. At this point you can determine whether this thread is the first to arrive at that construct. If its queue is empty, it is the first, otherwise it is not first, and the oldest entry in its queue will be the entry corresponding to this construct.

When a thread is the first thread to arrive at a construct, a distribution is chosen for every thread and a record is created and enqueued in each thread queue (including the caller). The distributions can be chosen nondeterministically, possibly with some restrictions to achieve some tractability/soundness compromise. The record for this thread is then dequeued and the iterator returned.

If a thread is not the first to arrive, its record is dequeued and compared with the arguments given in the function call. They should match, and if they don't, an error is reported. This indicates that either threads encountered constructs in different orders or the loop parameters changed.

Translations of specific directives

Translating parallel

parallel: this spawns some nondeterministic number of threads. We will assume there is a constant THREAD_MAX defined somewhere. The number of threads created will be between 1 and THREAD_MAX (inclusive). Each thread is assigned an ID. The original ("master") thread has ID 0. All threads execute the parallel region.

  #pragma omp parallel ...
  S

=>

  { // begin parallel construct
    int _nthreads = 1+$choose_int(THREAD_MAX);
    $omp_gteam = $omp_gteam_create($here, nthreads);
    // declare shared variables and create shared objects

    $parfor (int _tid : 0..nthreads-1) {
      $omp_team team = $omp_team_create($here, gteam, _tid);
      // declare local copies of shared variables
      // declare private variables

      translate(S) but replace each private variable `x` with `_x`, and 
        translate access to shared variables using protocols above;
      flush any writes to shared variables;
    } // end $parfor
  } // end parallel construct

All variables that occur in the parallel construct, i.e., the lexical extent of the parallel construct, must be determined to be either private or shared. This is determined by the clauses and the default rules as specified in the OpenMP Standard. Obviously any variable declared within the construct itself must be private.

For all private variables x not declared within the parallel construct, create a new variable of the same type, _x. The new variable is declared within the thread scope. If x is also firstprivate, then _x is initialized with the value of x, e.g. int _x=x;. Otherwise, _x is uninitialized, so has an undefined value.

Translating for

Try to determine whether the loop iterations are independent. In that case, they can all be executed by one thread. Otherwise:

// location 23:
#pragma omp parallel for
for (i=0; i<n; i++)  
  S

=>

{
  CIVL_omp_loop_info info = $omp_ws_arrive_loop(_ws, 23);

  int numIters = info.numIters;
  for (int j=0; j<numIters; j++) {
    int i = info.iters[j][0];

    translate(S);
  }
  barrier_and_flush();
}

We can vary the way iterators are chosen to explore different tradeoffs and strategies. On one extreme, every kind of partition can be explored; on the other, some fixed strategy like round-robin with chunksize 1 can be used. This only changes the definition of $omp_ws_arrive_loop, not the translation above.

// location 78:
#pragma omp parallel for collapse(3)
for (i=0; i<n; i++)
  for (j=0; j<m; j++)
    for (k=0; k<l; k++) {
      S
    }

=>

{
  CIVL_omp_loop_info info = $omp_ws_arrive_loop(_ws, 78);

  int numIters = info.numIters;
  for (int count=0; count<numIters; count++) {
    int i = info.iters[count][0];
    int j = info.iters[count][1];
    int k = info.iters[count][2];

    translate(S);
  }
  barrier_and_flush();
}

Translating reduction clause

#pragma omp for reduction(+:x,y)
for (i=a; i<b; i++) {
  S
}

=>

{
  CIVL_omp_loop_info info = $omp_ws_arrive_loop(_ws, 23);
  double _x=0.0, _y=0.0;

  int numIters = info.numIters;
  for (int _count=0; _count<numIters; _count++) {
    int i = info.iters[_count][0];

    translate(S) but replace x with _x and y with _y;
  }
  x += _x;
  y += _y;
  // note: do something with POR so it knows the operations above from
  // different threads commute
  barrier_and_flush();
}

Translating sections

If there are n sections, create n functions: section1, section2, .... Again the question is how to distribute them among threads and in what order. As with loops, you really want to check these are independent and only do the interleaving exploration as a last resort.

// location 42:
#pragma omp sections
#pragma omp section
  S0
#pragma omp section
  S1
...

=>

{
  $int_iter iter = $omp_ws_arrive_sections(_ws, 42);

  while ($int_iter_hasNext(iter)) {
    int _i = $int_iter_next(iter);

    switch (_i) {
    case 0: {
      translate(S0);
      break;
    }
    case 1: {
      translate(S1);
      break;
    }
    ...
    } /* end of switch */
  } /* end of while loop */
  barrier_and_flush();
}

Translating single

// location 33:
#pragma omp single
S

=>

if ($omp_arrive_single(_ws, 33)) {
  translate(S);
}
barrier_and_flush();

Translating barrier

// location 58:
#pragma omp barrier

=>

$omp_barrier_arrive(_ws, 58);
barrier_and_flush();

Translating critical

Basically, use a lock for each critical name, plus one for the "no name". All threads must obtain lock to enter the critical section, then release it. I.e., if there are critical sections name a, b, and c, there should be global root-scope variables of boolean type named _critical_noname, _critical_a, etc.

#pragma omp critical a
S

=>

...
_Bool _critical_a = $false;
.
.
.
$when (!_critical_a) _critical_a=$true;
translate(S);
_critical_a=$false;

Translating atomic

In general, reads and writes to shared variables will be processed using the protocols described above (dealing with state variables and ownership, etc.). However if the operation occurs within an omp atomic construct, it is translated differently.

If sequentially consistent atomic...

If non-sequentially consistent atomic...

Translatingordered

This can only be used inside and OMP for loop in which the pragma used the ordered clause. (Check that.) It indicates that the specified region must be executed in iteration order.

In this case the system function must return an int iterator in which the ints occur in loop order.

#pragma omp for ordered
for (i=a; i<b; i++) {
  ...
  #pragma omp ordered
  S1
  ...
  #pragma omp ordered
  S2
  ...
}

=>

{
  CIVL_omp_loop_info info = $omp_ws_arrive_loop(_ws, 23);
  int order1=a, order2=a;
  int numIters = info.numIters;

  for (int _i=0; _i<numIters; _i++) {
    int i = info.iters[_i][0];
     ...
    $when (order1==i) {
      translate(S1);
      order1++;
    }
      ...
    $when (order2==i) {
      translate(S2);
      order2++;
    }
      ...
  }
}

Translating master

#pragma omp master
S

=>

if (_tid == 0) {
  translate(S);
}

Translating functions

  • omp_get_num_threads() => _nthreads
  • omp_get_thread_num() => _tid
Note: See TracWiki for help on using the wiki.