Changes between Version 28 and Version 29 of Next-GenOpenMPTransformation


Ignore:
Timestamp:
06/25/19 16:50:06 (7 years ago)
Author:
siegel
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Next-GenOpenMPTransformation

    v28 v29  
    353353* `void $omp_flush_all($omp_team)`
    354354 * performs an OpenMP flush operation on all shared objects.  This is the default in OpenMP if no argument is specified for a flush construct.
     355
     356=== Worksharing and barriers ===
     357
     358* `void $omp_barrier($omp_team team)`
     359 * performs a barrier only.  Note however that usually (always?) a barrier is accompanied by a flush-all, so `$omp_barrier_and_flush` should be used instead.
     360
     361* `void $omp_barrier_and_flush($omp_team team)`
     362 * combines a barrier and a flush on all shared objects owned by the team.  Implicit in many OpenMP worksharing constructs.
     363
     364* `$domain $omp_arrive_loop($omp_team team, int location, $domain loop_dom,  $DecompositionStrategy strategy)`
     365 * called by a thread when it reaches an omp for loop, this function returns the subset of the loop domain specifying the iterations that this thread will execute.  The dimension of the domain returned equals the dimension of the given domain `loop_dom`.
     366
     367* `$domain(1) $omp_arrive_sections($omp_team team, int location, int numSections)`
     368 * called by a thread when it reaches an omp sections construct, this function returns the subset of the integers 0..numSections-1 specifying the indexes of the sections that this thread will execute.  The sections are numbered from 0 in increasing order.
     369
     370* `int $omp_arrive_single($omp_team team, int location)`
     371 * called by a thread when it reaches on omp single construct, returns the thread ID of the thread that will execute the single construct.
     372
     373
     374== Worksharing model ==
     375
     376This section describes how the system functions dealing with worksharing are implemented.
     377
     378The global data structure `$omp_gteam` contains a FIFO queue for each thread.   The queue contains work-sharing records, one record for each work-sharing or barrier construct encountered.    The record contains the basic information about the construct as provided by the arguments to the arrival function, as well as the distribution chosen for that thread.
     379
     380The constructs are a lot like MPI collective operations, and are modeled similarly.
     381
     382When a thread arrives at one of these constructs, it invokes the relevant arrival function.  At this point you can determine whether this thread is the first to arrive at that construct.  If its queue is empty, it is the first, otherwise it is not first, and the oldest entry in its queue will be the entry corresponding to this construct.
     383
     384When a thread is the first thread to arrive at a construct, a distribution is chosen for every thread and a record is created and enqueued in each thread queue (including the caller).   The distributions can be chosen nondeterministically, possibly with some restrictions to achieve some tractability/soundness compromise.  The record for this thread is then dequeued and the iterator returned.
     385
     386If a thread is not the first to arrive, its record is dequeued and compared with the arguments given in the function call.  They should match, and if they don't, an error is reported.  This indicates that either threads encountered constructs in different orders or the loop parameters changed.
     387
     388
     389== Translations of specific directives ==
     390
     391=== Translating `parallel` ===
     392
     393`parallel`: this spawns some nondeterministic number of threads.  We will assume there is a constant `THREAD_MAX` defined somewhere.  The number of threads created will be between 1 and `THREAD_MAX` (inclusive).  Each thread is assigned an ID.  The original ("master") thread has ID 0.  All threads execute the parallel region.
     394
     395{{{
     396  float x; // shared
     397  int y; // private
     398  #pragma omp parallel shared(x) private(y)
     399  {
     400    ...
     401    x=5.2;
     402    y=3;
     403   ...
     404  }
     405}}}
     406
     407=>
     408
     409{{{
     410  float x;
     411  int y;
     412  { // begin parallel construct
     413    int _nthreads = 1+$choose_int(THREAD_MAX);
     414    $omp_gteam gteam = $omp_gteam_create($here, nthreads);
     415    $omp_gshared x_gshared = $omp_gshared_create(gteam, &x);
     416
     417    $parfor (int _tid : {0..nthreads-1}) {
     418      $omp_team team = $omp_team_create($here, gteam, _tid);
     419      $omp_shared x_shared = $omp_shared_create(team, x_gshared);
     420      int _y; // private variable
     421
     422      ...
     423      { // "x=5.2":
     424        float tmp = 5.2;
     425 
     426        $omp_write(x_shared, x_shared->local, &tmp);
     427      }
     428      _y = 3;
     429      ...
     430     $omp_barrier_and_flush(team); // implicit at end of parallel region
     431     $omp_shared_destroy(x_shared);
     432     $omp_team_destroy(team);
     433    } // end $parfor
     434    $omp_gshared_destroy(x_gshared);
     435    $omp_gteam_destroy(gteam);
     436  } // end parallel construct
     437}}}
     438
     439All variables that occur in the parallel construct, i.e., the lexical extent of the parallel construct, must be determined to be either private or shared.   This is determined by the clauses and the default rules as specified in the OpenMP Standard.  Obviously any variable declared within the construct itself must be private.
     440
     441For all private variables `y` not declared within the parallel construct, create a new variable of the same type, `_y`.    The new variable is declared within the thread scope.  If `y` is also firstprivate,  then `_y` is initialized with the value of `y`, e.g. `int _y=y;`.  Otherwise, `_y` is uninitialized, so has an undefined value.
     442
     443=== Translating `for` ===
     444
     445Try to determine whether the loop iterations are independent.  In that case, they can all be executed by one thread.
     446Otherwise:
     447
     448{{{
     449#pragma omp for
     450for (i=0; i<n; i++) 
     451  S
     452}}}
     453
     454=>
     455
     456{{{
     457{
     458  $domain loop_domain = {0..n-1};
     459  $domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY);
     460
     461  $for (int i : my_iters) {
     462    translate(S);
     463  }
     464  $barrier_and_flush(team);
     465}
     466}}}
     467
     468We can vary the way the sub-domains are chosen to explore different tradeoffs and strategies.  On one extreme, every kind of partition can be explored; on the other, some fixed strategy like round-robin with chunksize 1 can be used.  This only changes the definition of `$omp_arrive_loop`, not the translation above.
     469
     470{{{
     471#pragma omp parallel for collapse(3)
     472for (i=0; i<n; i++)
     473  for (j=0; j<m; j++)
     474    for (k=0; k<l; k++) {
     475      S
     476    }
     477}}}
     478
     479=>
     480
     481{{{
     482{
     483  $domain loop_domain = {0..n-1, 0..m-1, 0..l-1};
     484  $domain(3) my_iters = ($domain(3))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY);
     485
     486  $for (int i, j, k : my_iters) {
     487    translate(S);
     488  }
     489  $omp_barrier_and_flush(team);
     490}
     491}}}
     492
     493=== Translating `reduction` clause ===
     494
     495{{{
     496#pragma omp for reduction(+:x,y)
     497for (i=a; i<b; i++) {
     498  S
     499}
     500}}}
     501
     502=>
     503
     504The following translation assumes that $omp_shared has already been created somewhere.
     505
     506{{{
     507{
     508  $domain loop_domain = {a..b-1};
     509  $domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY);
     510  double _x=0.0, _y=0.0; // not the local view of the shared variable x/y
     511
     512  $for (int i : my_iters) {
     513    translate(S) but replace x with _x and y with _y;
     514  }
     515  $omp_apply_assoc(x_shared, CIVL_SUM, &_x);
     516  $omp_apply_assoc(y_shared, CIVL_SUM, &_y);
     517  $omp_barrier_and_flush(team);
     518}
     519}}}
     520
     521
     522
     523=== Translating `sections` ===
     524
     525Say there are `numSections` sections.  This number is known statically.
     526
     527{{{
     528#pragma omp sections
     529#pragma omp section
     530  S0
     531#pragma omp section
     532  S1
     533...
     534}}}
     535
     536=>
     537
     538{{{
     539{
     540  $domain(1) my_secs = $omp_arrive_sections(team, SEC_LOC++, numSections);
     541
     542  $for (int i : my_secs) {
     543    switch (i) {
     544    case 0: {
     545      translate(S0);
     546      break;
     547    }
     548    case 1: {
     549      translate(S1);
     550      break;
     551    }
     552    ...
     553    } /* end of switch */
     554  } /* end of $for loop */
     555  $omp_barrier_and_flush();
     556}
     557}}}
     558
     559
     560=== Translating `single` ===
     561
     562{{{
     563#pragma omp single
     564S
     565}}}
     566
     567=>
     568
     569{{{
     570int owner = $omp_arrive_single(team, SINGLE_LOC++);
     571
     572if (owner == _tid) {
     573  translate(S);
     574}
     575$omp_barrier_and_flush(team);
     576}}}
     577
     578
     579=== Translating `barrier` ===
     580
     581{{{
     582#pragma omp barrier
     583}}}
     584
     585=>
     586
     587{{{
     588$omp_barrier_and_flush(team);
     589}}}
     590
     591=== Translating `critical` ===
     592
     593Basically, use a lock for each critical name, plus one for the "no name".  All threads must obtain lock to enter the critical section, then release it.
     594I.e., if there are critical sections name a, b, and c, there should be global root-scope variables of boolean type named `_critical_noname`, `_critical_a`, etc.
     595
     596{{{
     597#pragma omp critical(a)
     598S
     599}}}
     600
     601=>
     602
     603{{{
     604...
     605_Bool _critical_a = $false;
     606.
     607.
     608.
     609$when (!_critical_a) _critical_a=$true;
     610translate(S);
     611_critical_a=$false;
     612}}}
     613
     614=== Translating `atomic` ===
     615
     616In general, reads and writes to shared variables will be processed using the protocols described above.  However if the operation occurs within an omp atomic construct, it is translated differently.
     617
     618TODO: need to look up the rules on the different flavors of atomics.
     619
     620If sequentially consistent atomic...
     621
     622If non-sequentially consistent atomic...
     623
     624
     625===  Translating`ordered` ===
     626
     627This can only be used inside and OMP `for` loop in which the pragma used the `ordered` clause.  (Check that.)  It indicates that the specified region must be executed in iteration order.
     628
     629In this case the system function must return an int iterator in which the ints occur in loop order.
     630
     631{{{
     632#pragma omp for ordered
     633for (i=a; i<b; i++) {
     634  ...
     635  #pragma omp ordered
     636  S1
     637  ...
     638  #pragma omp ordered
     639  S2
     640  ...
     641}
     642}}}
     643
     644=>
     645
     646{{{
     647{
     648  $domain loop_domain = {a..b};
     649  $domain(1) my_iters = ($domain(1))$omp_arrive_loop(team, FOR_LOC++, loop_domain, STRATEGY);
     650  int order1=a, order2=a;
     651
     652  $for (int i : my_iters) {
     653     ...
     654    $when (order1==i) {
     655      translate(S1);
     656      order1++;
     657    }
     658      ...
     659    $when (order2==i) {
     660      translate(S2);
     661      order2++;
     662    }
     663      ...
     664  }
     665}
     666}}}
     667
     668=== Translating `master` ===
     669
     670{{{
     671#pragma omp master
     672S
     673}}}
     674
     675=>
     676
     677{{{
     678if (_tid == 0) {
     679  translate(S);
     680}
     681}}}
     682
     683=== Translating `nowait` ===
     684
     685Just leave out the `$omp_barrier_and_flush` at the end of the translated construct.
     686