wiki:OpenMPTransformation

Version 10 (modified by siegel, 12 years ago) ( diff )

--

OpenMP Primitives

Constructs

  • parallel
  • worksharing
    • for
    • sections and section
    • single
  • synchronization
    • barrier
    • critical
    • atomic
    • ordered
    • master
  • threadprivate
  • flush

Clauses

  • private(list)
  • firstprivate(list)
  • lastprivate(list)
  • copyin(list)
  • shared(list)
  • default(none|shared)
  • num_threads(n)
  • schedule(static, n)
  • schedule(dynamic, n)
  • ...
  • ordered
  • nowait

Functions

  • omp_get_num_threads()
  • omp_get_thread_num()

Helper primitives

$int_iter is a handle type for an iterator of integers.

/* Tells whether the integer iterator has any more elements */ _Bool $int_iter_hasNext($int_iter iter);

/* Returns the next element in the iterator (and updates the iterator) */ int $int_iter_next($int_iter iter);

Worksharing state

The worksharing state will be stored in another handle type object. The situation here is analogous to the $gcomm and $comm use for MPI. Those objects store the shared state for message-passing. We need similar object for shared state the coordinates work-sharing and barrier constructs:

  • $omp_gws: global work-sharing state
  • $omp_ws: local state. A reference to a global object and a thread ID.

API:

/* Creates new global work-sharing state object, returning
 * handle to it.  nthreads is the number of threads in
 * the parallel region.  There is one of these per parallel region,
 * created upon entering the region */
$omp_gws $omp_gws_create($scope scope, int nthreads);

$omp_gws_destroy($omp_gws gws);

/* Creates a local work-sharing object, which is basically
 * a pair consisting of a global work-sharing handle and
 * a thread id. */
$omp_ws $omp_ws_create($omp_gws, int tid);

$omp_ws_destroy($omp_ws ws);

/* for "for" loops only: called when a thread arrives, it
 * returns the sequence of loop iterations to be performed by
 * the thread.  Parameter location is the ID of the model location
 * of the top of the loop.  It is needed to check that all threads
 * encounter the same worksharing statements in the same order.
 * Parameter start is the initial value of the loop variable;
 * end is its final value; and inc is the increment (which can be
 * positive or negative). */
$int_iter $omp_ws_arrive_loop($omp_ws ws, int location, int start, int end, int inc);

/* for sections: called at arrival, returns the sequence of sections to
 * be executed by calling thread.  The sections are numbered in order,
 * starting from 0. */
$int_iter $omp_ws_arrive_sections($omp_ws ws, int location);

/* for single: called on arrival, returns whether or not to execute
 * the single code */
_Bool $omp_ws_arrive_single($omp_ws ws, int location);

/* called when arriving at a barrier.  This does not
 * impose the barrier, you still need to call system function
 * $barrier... for that.  This is needed to ensure all threads
 * in the team call the same sequence of worksharing and barrier
 * constructs.  */
void $omp_ws_arrive_barrier($omp_ws ws, int location);

Translation Strategies

Translating shared variables

Translating parallel

parallel: this spawns some nondeterministic number of threads. We will assume there is a constant THREAD_MAX defined somewhere. The number of threads created will be between 1 and THREAD_MAX (inclusive). Each thread is assigned an ID. The original ("master") thread has ID 0. All threads execute the parallel region.

  #pragma omp parallel ...
  S

=>

  {
    int _nthreads = 1+$choose_int(THREAD_MAX);
    $proc _threads[_nthreads];
    void _thread(int _tid) {
      translate(S)
    }

    for (int i=0; i<_nthreads; i++) _threads[i]=$spawn _thread(i);
    for (int i=0; i<_nthreads; i++) $wait(_threads[i]);
  }

All variables that occur in the parallel construct, i.e., the lexical extent of the parallel construct, must be determined to be either private or shared. This is determined by the clauses and the default rules as specified in the OpenMP Standard. Obviously any variable declared within the construct itself must be private.

For all private variables x not declared within the parallel construct, create a new variable of the same type, _x. The new variable is declared within the thread scope. If x is also firstprivate, then _x is initialized with the value of x, e.g. int _x=x;. Otherwise, _x is uninitialized, so has an undefined value.

Translating for

Try to determine whether the loop iterations are independent. In that case, they can all be executed by one thread.

Otherwise, iterations must be distributed among the threads in some nondeterministic way. This could blow up rapidly! Also, a thread does not have to execute its iterations in increasing order. It can execute them in any order.

Trying a few different things for now: picking a particular scheduling policy like round-robin (status with chunk size 1). Of course you can always do this if schedule is specified to be static.

The question is do we ever want to try to explore these interleavings?

Is there any loss of generality by just running all iterations concurrently?

One approach: assume you have a function or macro CIVL_owns(n, t, i). It takes three ints and returns a boolean. The arguments are n: the number of threads; t: a thread ID between 0 and n-1 (inclusive); and i, an iteration index.

#pragma omp parallel for
  for (i...)
  S

=>

for (i...) {
  if (CIVL_owns(_nthreads, _tid, i)) {
    translate(S)
  }
}
barrier (unless no wait)

More general way:

  {

//use distributions
  }


Translating sections

If there are n sections, create n functions: section1, section2, .... Again the question is how to distribute them among threads and in what order. As with loops, you really want to check these are independent and only do the interleaving exploration as a last resort.

#pragma omp sections
  {
  #pragma omp section
  ...
  #pragma omp section
  ...
  }

=>

  {
    void section0() {
      ...
    }
    void section1() {
      ...
    }
    ...
    if (CIVL_owns(_nthreads, _tid, 0)
      section0();
    if (CIVL_owns(_nthreads, _tid, 1)
      section1();
    ...
    barrier unless nowait;
  }

Translating single

Nondeterministically choose a thread, i.e, $choose_int(threads). That thread executes the code, the rest skip it. The question is, which thread does the choosing? The first thread to arrive at that construct?

Once again, try to determine if it matters. If the modifications and reads do not involve any private data, it doesn't matter which thread does it, so make it thread 0.

There is a barrier at the end.

Translating barrier

Provide some system functions for this. All the threads in the team (threads[i]) register with a barrier object and partake in the barrier. Can re-use that barrier object for multiple barriers.

Translating critical

Basically, use a lock for each critical name, plus one for the "no name". All threads must obtain lock to enter the critical section, then release it.

Translating atomic

This is just $atomic.

Translatingordered

This can only be used inside and OMP for loop in which the pragma used the ordered clause. (Check that.) It indicates that the specified region must be executed in iteration order.

#pragma omp for ordered
for (i=a; i<b; i++) {
  ...
  #pragma omp ordered
  S1
  ...
  #pragma omp ordered
  S2
  ...
}

=>

{
  int order1=a, order2=a;
  for (i=a; i<b; i++) {
    if (CIVL_owns(nthreads, tid, i)) {
      ...
      $when (order1==i) {
        translate(S);
        order1++;
      }
      ...
      $when (order2==i) {
        translate(S2);
        order2++;
      }
      ...
    }
  }
}
Note: See TracWiki for help on using the wiki.