Context Navigation

Changes between Version 77 and Version 78 of Next-GenOpenMPTransformation

Timestamp:: 12/02/21 15:47:30 (5 years ago)
Author:: wuwenhao
Comment:: Distribute the whole content to different wiki pages.

Legend:

: Unmodified
: Added
: Removed
: Modified

Next-GenOpenMPTransformation

-              v77
+              v78
 = INDEX =
-[https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#a1.TheSequentiallyConsistentSubsetofOpenMP 1. The Sequentially Consistent Subset of OpenMP]\\
-[https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#a2.SupportedOpenMPFeatures 2. Supported OpenMP Features]\\
-[https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#a3.OpenMPSimplification 3. OpenMP Simplification]\\
-[https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#a4.OpenMPTransformations 4. OpenMP Transofrmation]
 = General Ideas of Omp Transformation =
+== 1 [wiki:OpenMP-Transformation-Introduction] ==
+. At a right mover (e.g. lock, enter_critical, enter_atomic), we do yield and push.
+. At a left mover (e.g., unlock, exit_critical, exit_atomic), we do check and then clean (pop).
+    (Note that it only clean whats in between a pair of right mover and left mover.)
+. A barrier is transformed as: barrier() ; $check_data_race(); $track_clean(); barrier();
+. Read and write operations are tracked in $atomic scope and functions/primitives of tracking read/write operations are:
+  - {{{$track{}}} . . . . . . . . . . . . . . . . creates a new pair of read/write sets for tracking.
+  - {{{} // end of $track scope}}} . . removes read/write sets currently used for tracking.
+  - {{{$track_empty();}}} . . . . . . . . . . empties read/write sets currently used for tracking.
+  - {{{$tracked_reads();}}} . . . . . . . . returns a set of tracked shared variables read by its caller process.
+  - {{{$tracked_writes();}}} . . . . . . . returns a set of tracked shared variables written by its caller process.
+. {{{$check_data_race}}} asserts that:
+{{{
+  $forall (int i, j: 0 .. nthreads - 1) (i != j) ==>
+    $mem_disjoint(
+      writes[i],
+      $mem_union(reads[j], writes[j])
+    );
+}}}
+.1 [The Sequentially Consistent Subset of OpenMP]
+E.g.,
+{{{
+#pragma omp parallel shared(s)
+{
+  A;
+  #pragma omp critical(c)
+  {
+    B;
+  }
+  C;
+}
+}}}
+.2 [General Ideas of OpenMP Transformation]
+ ==>
+== 2 [wiki:OpenMP-Transformation-Supported-Features] ==
+{{{
+void $check_data_race($mem *rs, $mem *ws, int n, int tid) {
+  rs[tid] = $tracked_reads();
+  ws[tid] = $tracked_writes();
+  $assert(
+    $forall (int j: 0 .. nthreads - 1) (j != tid) ==>
+      $mem_disjoint(ws[tid], $mem_union(rs[j], ws[j])) &&
+      $mem_disjoint(...)
+  );
+}
+}}}
+{{{
+// Init parallel region env.
+$range thread_range = 0 .. nthreads - 1;
+$domain(1) dom = ($domain){thread_range};
+$omp_gteam gteam = $omp_gteam_create($here, nthreads);
+$mem rs[nthreads], ws[nthreads]; // read sets and write sets.
+for(int i=0; i<nthreads; i++) {
+  ws[i] = $mem_empty();
+  rs[i] = $mem_empty();
+}
+.1 [Supported OpenMP Directives]
+// Transformed parallel region
+$for (int tid: dom) {
+  $depends_on(/*nothing*/)$atomic{
+    $omp_team team = $omp_team_create($here, gteam, tid);
+    $track{ // RS stack: rs0:={}; WS stack: ws0:={};
+      // Enter parallel region
+      A;
+      $yield();
+      $when(critical_c->value == 0);
+      $track{ // Ignore race on lock
+         critical_c->value = 1; // Enter critical block
+      }
+      B;
+      $track{ // Ignore race on lock
+         critical_c->value = 0; // Exit critical block
+      }
+      // RS stack: rs0:={A, B}; WS stack: ws0:={A, B};
+      $check_data_race(&rs, &ws, nthreads, tid); // 1st check
+      $track_clean();
+      rs[tid] = $mem_empty();
+      ws[tid] = $mem_empty();
+      // RS stack: rs0:={}; WS stack: ws0:={};
+      C;
+      $check_data_race(&rs, &ws, nthreads, tid); // 2nd check
+    } // RS stack: <empty>; WS stack: <empty>; empty sets rs0, ws0;
+  } // end $atomic
+} // end $for
+}}}
+.1.1 [`parallel` Construct]
+We denotes that thread x executes statement A as Ax.
+Assuming there are two threads 0 and 1, two traces shall be explored:
+.1.2 [Worksharing Constructs]
+Trace1: A0y0 A1y1 B0cdr0C0cdr0 B1cdr1C1cdr1;
+.1.3 [Loop-Related Directives]
+Trace2: A0y0 A1y1 B1cdr1C1cdr1 B0cdr0C0cdr0;
+.1.4 [Synchronization Constructs and Clauses]
+For the first trace, cdr0 following B0 checks data races between
+.1.5 [Data Environment]
+A0 and A1;
+B0 and A1;
+.2. [Supported OpenMP Runtime Library Types and Rountines]
+Then, w/r sets on thread 0 is cleaned.
+.2.1 [Execution Environment Rountines]
+The second cdr0 following C0 checks data races between
+.2.2 [Lock Routines]
+C0 and A1;
+.2.3 [Timing Routines]
-An implicit barrier is placed at the end after the second cdr0,
-thus thread 1 starts executing its remaining part.
-The first cdr1 checks data races between
-C0 and A1;
-C0 and B1;
-Then, tracked sets on both A1 and B1 are cleaned.
-Finally, the last cdr1 in the first trace checks data races between
-C0 and C1;
-The second trace will complete checking by performing checks between
-B1 and A0
-C1 and A0
-C1 and B0
-Because B is protected by critical section, races between B0 and B1 are not checked.
-As a result, all required pairs are checked.
-= 1. The Sequentially Consistent Subset of OpenMP =
-OpenMP programs that:
-* Do not use non-sequentially consistent atomic directives;
-* Do not rely on the accuracy of a false result from omp_test_lock and omp_test_nest_lock; and
-* Correctly avoid data races as required in Section 1.4.1 on page 23 (OpenMP spec 5.0) ([https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf spec])
-The relaxed consistency model is invisible for such programs, and any explicit flush operations in such programs are redundant.
-= 2. Supported OpenMP Features =
-== OpenMP Constructs ==
-* `parallel` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Translatingparallel detail])\\
- * `private(`list`)`
- * `firstprivate(list)`
- * `copyin(list)` . . . . . . . . .[Unsupported]
- * `shared(`list`)`
- * `default(`option`)` . . . . . .[Unsupported]
-  * `none`
-  * `shared`
- * `num_threads(`n`)`
- * `reduction(op:list)` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Case1:parallelreduction detail])
-* `sections` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Translatingsections detail])
- * `private(`list`)`
- * `firstprivate(list)`
- * `lastprivate(list)` . . . . .[In-progress]
- * `reduction(op:list)`
- * `nowait`
-* `section` (see `sections`)
-* `single` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Translatingsingle detail])
- * `private(`list`)`
- * `firstprivate(list)`
- * `copyprivate(list)` . . . . .[Unsupported]
- * `nowait`
-* `for` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Translatingfor detail])
- * `private(`list`)`
- * `firstprivate(list)`
- * `lastprivate(list)` . . . . .[In-progress]
- * `reduction` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Case2:forreduction detail])
- * `schedule` . . . . . . . . . . . . [Unsupported]
- * `collapse`
- * `nowait`
-* `simd` . . . . . . . . . . . . . . . . . . . . [Unsupported]
- * `safelen(n)`
- * `linear(n)`
- * `aligned(n)`
- * `private`
- * `lastprivate`
- * `reduction`
- * `collapse`
-* `for simd` . . . . . . . . . . . . . . . . .[Unsupported]
- * `safelen(n)`
- * `linear(n)`
- * `aligned(n)`
- * `private`
- * `lastprivate`
- * `reduction`
- * `collapse`
- * `firstprivate`
- * `nowait`
- * `schedule`
-* `declare simd` . . . . . . . . . . . . . [Unsupported]
- * `simdlen(n)`
- * `linear`
- * `aligned(n)`
- * `uniform`
- * `inbranch`
- * `notinbranch`
-* `master` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Translatingmaster detail])
-* `critical` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Translatingcritical detail])
- * `[name]`
-* `barrier` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Translatingbarrier detail])
-* `atomic` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Translatingatomic detail])
- * atomic-clause
-  * `read`
-  * `write`
-  * `update`
-  * `capture`
- * `seq_cst` . . . . . . . . . . . . . [Unsupported]
-* `ordered` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#Translatingordered detail])
-== OpenMP Types ==
-* `omp_lock_t` ([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPSimpleLock1 detail])
-== OpenMP Runtime Library Routines ==
-=== Execution Environment Routines ===
-* `omp_get_max_threads()` (translated as `_omp_thread_max`)
-* `omp_get_num_threads()` (translated as `_omp_nthreads`)
-* `omp_set_num_threads(int x)` (translated as `_omp_num_threads =`/x/)
-* `omp_get_thread_num()`(translated as `_omp_tid`)
-=== Lock Routines ===
-==== OpenMP Simple Lock ====
-* `omp_init_lock(omp_lock_t *)`
-* `omp_destroy_lock(omp_lock_t *)`
-* `omp_set_lock(omp_lock_t *)`
-* `omp_unset_lock(omp_lock_t *)`
-* `omp_test_lock(omp_lock_t *)`
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPSimpleLock1 detail])
-=== Timing Routines ===
-* `omp_get_wtime()` ([https://vsl.cis.udel.edu/trac/civl/browser/CIVL/trunk/src/include/civl/omp.cvl omp.cvl])
-== Plan ==
-* We are going to get rid of the current OMP2CIVL transformer and come up a new transformer that assumes given OpenMP programs are sequentially consistent
-* We are gong to improve the current OmpSimplifier using pointer alias analysis
-* Note that an atomic construct without `seq_cst` is outside of the  sequentially consistent subset of the language, we need a way to deal with that.
-== Notes ==
-* Currently, the simplifier is not aware of the cases that out-of-bound access on multiple dimensional arrays can raise data race.  For example,
-{{{
-int a[10][5];
-#pragma omp parallel for
-for (int i = 0; i < 5; i++)
-  for (int j = 1; j < 10; j++)
-    a[i][j] = a[i][j-1] // a[0][4] and a[1][-1] refer to the same element
-}}}
-The current simplifier will incorrectly sequentialize the example above without realizing the fact that this example is sequentializable if and only if no "logical" out-of-bound happens during the execution.
-A fix for the simplifier could be sequentializing the example with inserted assertion for making sure that there is no "logical" out-of-bound error.
-== Related ==
-* [[wiki: StaticAnalysis| Static Analysis]]
-= 3. OpenMP Simplification =
-== OpenMP Simplifier ==
-* We improve the existing OpenMP simplifier with the informations provided by the [[wiki: StaticAnalysis| Static Analysis]].
-* Example 1: (`DRB067-restrictpointer1-orig-no.c` from [https://github.com/LLNL/dataracebench/releases/tag/v1.2.0 DataRaceBench v1.2.0])
-{{{
-#include <stdlib.h>
-typedef double real8;
-void foo(real8 * restrict newSxx, real8 * restrict newSyy, int length)
+{
-  int i;
-#pragma omp parallel for private (i) firstprivate (length)
-  for (i = 0; i <= length - 1; i += 1) {
-    newSxx[i] = 0.0;
-    newSyy[i] = 0.0;
+  }
+}
-int main()
+{
-  int length=10;
-  real8* newSxx = malloc (length* sizeof (real8));
-  real8* newSyy = malloc (length* sizeof (real8));
-  foo(newSxx, newSyy, length);
-  free (newSxx);
-  free (newSyy);
-  return 0;
+}
-}}}
-  * The OpenMP simplifier analyzes the parallel region with points-to informations about the two pointer argument `newSxx` and `newSyy` from static analysis.
-  * The OpenMP simplifier can determine that no data race will happen in the parallel region as long as no array out-of-bound error happens in it.
-  * The OpenMP simplifier sequentializes the program which will be checked by CIVL.  Any possible array out-of-bound error will be caught by CIVL.
-* Example 2 : ( `DRB014-outofbounds-orig-yes.c` from  [https://github.com/LLNL/dataracebench/releases/tag/v1.2.0 DataRaceBench v1.2.0] )
-{{{
-#include <stdio.h>
-int main(int argc, char* argv[])
+{
-  int i,j;
-  int n=10, m=10;
-  double b[n][m];
-#pragma omp parallel for private(j)
-  for (i=1;i<n;i++)
-    for (j=0;j<m;j++) // Note there will be out of bound access
-      b[i][j]=b[i][j-1];
-  printf ("b[50][50]=%f\n",b[50][50]);
-  return 0;
+}
-}}}
-  * OpenMP Simplifier proves that `b[i][j]` and `b[i][j-1]` that accessed by different threads which hold different values of `i` will have no data race as long as no array out-of-bound error happens.
-  * OpenMP Simplifier sequentializes the OpenMP program while the verification condition for array out-of-bound freedom will be checked by CIVL by default.
-  * The CIVL back-end runs the sequentialized program and detects the array out-of-bound access.
-==  The Current Approach For Model Checking OpenMP Programs ==
-* Example:
-{{{
-#include <omp.h>
-int main()
+{
-  int a[10],i;
-#pragma omp parallel for
-  for(i=0; i<10; i++)
+    {
-      int c;
-      c = i % 2;
-      a[i + c] = 0;
+    }
+}
-}}}
-  * OpenMP Simplifier cannot prove data-race freedom for this program.
-  * OpenMP2CIVL Transformer will transform this program to a equivalent CIVL-C program for model checking. Here is the code snippet for the CIVL-C program after transformation:
-  {{{
-int main() {
-  int  a[10];
+  {
+    {
-      int _elab_i = 0;
-      for (; _elab_i < _omp_num_threads; _elab_i = _elab_i + 1)
+        ;
+    }
-    int $sef$0 = $choose_int(_omp_num_threads);
-    int _omp_nthreads = 1 + $sef$0;
-    $range _omp_thread_range = 0 .. _omp_nthreads - 1;
-    $domain(1) _omp_dom = ($domain){_omp_thread_range};
-    $omp_gteam _omp_gteam = $omp_gteam_create($here, _omp_nthreads);
-    $omp_gshared _omp_a_gshared = $omp_gshared_create(_omp_gteam, &(a));
-    $parfor (int _omp_tid: _omp_dom) {
-      $omp_team _omp_team = $omp_team_create($here, _omp_gteam, _omp_tid);
-      int  _omp_a_local[10];
-      int  _omp_a_status[10];
-      $omp_shared _omp_a_shared = $omp_shared_create(_omp_team, _omp_a_gshared, &(_omp_a_local), &(_omp_a_status));
+      {
-        $range _omp_r1 = 1 .. 10 - 1 # 1;
-        $domain(1) _omp_loop_domain = ($domain){_omp_r1};
-        $domain $sef$1 = $omp_arrive_loop(_omp_team, 0, ($domain)_omp_loop_domain, 2);
-        $domain(1) _omp_my_iters = ($domain(1))$sef$1;
-        $for (int _omp_i_private: _omp_my_iters) {
-          int c;
-          c = _omp_i_private % 2;
-          int _omp_write0;
-          _omp_write0 = 0;
-          $omp_write(_omp_a_shared, &(_omp_a_local[c]), &(_omp_write0));
+        }
+      }
-      $omp_barrier_and_flush(_omp_team);
-      $omp_shared_destroy(_omp_a_shared);
-      $omp_team_destroy(_omp_team);
+    }
-    $omp_gshared_destroy(_omp_a_gshared);
-    $omp_gteam_destroy(_omp_gteam);
+  }
+}
-  }}}
-  * CIVL reports PROVABLE errors:
-  {{{
-Thread 1 can not safely write to memory location &<d5>a[0], because thread 0 has
-written to that memory location and hasn't flushed yet.
-Violation 0 encountered at depth 64:
-CIVL execution violation in p2 (kind: ASSERTION_VIOLATION, certainty: PROVEABLE)
-at OpenMPTransformer "_omp_a_shared, &(_om" inserted by OpenMPTransformer.a_sharedWriteCall before civlc.cvh:105.14-20 "$malloc"
-Assertion: false
- -> false
-. . .
-Call stacks:
-process 0:
-  main at OpenMPTransformer "$parfor (int _omp_ti" inserted by OpenMPTransformer.parallelPragma before civlc.cvh:105.14-20 "$malloc"
-process 1:
-  $barrier_exit at concurrency.cvl:58.2-6 "$when" called from
-  $barrier_call at concurrency.cvl:63.2-14 "$barrier_exit" called from
-  $omp_barrier_and_flush at civl-omp.cvl:322.2-14 "$barrier_call" called from
-  _par_proc0 at OpenMPTransformer "_omp_team" inserted by OpenMPTransformer.barrierAndFlushCall before civlc.cvh:105.14-20 "$malloc"
-process 2:
-  $omp_write at civl-omp.cvl:247.4-10 "$assert" called from
-  _par_proc0 at OpenMPTransformer "_omp_a_shared, &(_om" inserted by OpenMPTransformer.a_sharedWriteCall before civlc.cvh:105.14-20 "$malloc"
-process 3:
-  $omp_arrive_loop at civl-omp.cvl:405.2-8 "$atomic" called from
-  _par_proc0 at OpenMPTransformer "_omp_team, 0, ($doma" inserted by OpenMPTransformer.myItersDeclaration before civlc.cvh:105.14-20 "$malloc"
-Logging new entry 0, writing trace to CIVLREP/testOmp5_0.trace
-Terminating search after finding 1 violation.
-  }}}
-== The New Approach for Model Checking OpenMP programs ==
-* Using CIVL's new infrastructure that captures reads / writes at runtime
-* If the OpenMP Simplifier fails to sequentialize a program,  transform the OpenMP program to the following form for model checking:
-{{{
-$mem writes[nthreads], reads[nthreads];
-$parfor (int tid:0..nthreads-1) {
-  $write_set_push();
-  $read_set_push();
-  block1;
-  // barrier
-  writes[tid] = $write_set_pop();
-  reads[tid] = $read_set_pop();
-  // check for dataraces (collective operation)
-  $write_set_push();
-  $read_set_push();
-  // barrier
-  stmt2;
-  // barrier
-  writes[tid] = $write_set_pop();
-  reads[tid] = $read_set_pop();
-  // check for dataraces (collective operation)
+}
-}}}
-* Functions for managing `$mem` objects can be found in `mem.cvh`
-= 4. OpenMP Transformations =
-== Support Types ==
-* `$omp_gteam`: global team object, represents a team of threads executing in a parallel region.  A handle type.  This is where all the state needed to correctly execute a parallel region will be stored.   This includes a global barrier and a worksharing queue (incomplete array-of-$omp_work_record) for every thread. Definition:
-{{{
-typedef struct OMP_gteam {
-  $scope scope;
-  int nthreads;
-  _Bool init[];
-  $omp_work_record work[][];
-  $gbarrier gbarrier;
-} * $omp_gteam;
-}}}
-* `$omp_team`: local object belonging to a single thread and referencing the global team object.  A handle type. It also includes a local barrier. Definition:
-{{{
-typedef struct OMP_team {
-  $omp_gteam gteam;
-  $scope scope;
-  int tid;
-  $barrier barrier;
-} * $omp_team;
-}}}
-* `$omp_work_record`: the worksharing information that a thread needs for executing a worksharing region. It contains the kind of the worksharing region, the location of the region, the status of the region and the subdomain (iterations/sections/task assigned to the thread).
-{{{
-typedef struct OMP_work_record {
-  int kind; // loop, barrier, sections, or single
-  int location; // location in model of construct
-  _Bool arrived; // has this thread arrived yet?
-  $domain loop_dom;// full loop domain; null if not loop
-  $domain subdomain; // tasks this thread must do
-} $omp_work_record;
-}}}
-== Support Functions ==
-=== Team creation and destruction ===
-* `$omp_gteam $omp_gteam_create($scope scope, int nthreads)`
- * creates new global team object, allocating object in heap in the specified scope.  Number of threads that will be in the team is `nthreads`.
-* `void $omp_gteam_destroy($omp_gteam gteam)`
- * destroys the global team object.  All shared objects associated to the team must have been destroyed before calling this function.
-* `$omp_team $omp_team_create($scope scope, $omp_gteam gteam, int tid)`
- * creates new local team object for a specific thread.
-* `void $omp_team_destroy($omp_team team)`
- * destroys the local team object
-=== Other ===
-* `void $omp_apply_assoc(void * x, $operation op, void * y)`
- * translation of `x = x op y`
-* `void $omp_flush_all($omp_team)`
- * performs an OpenMP flush operation on all shared objects.  This is the default in OpenMP if no argument is specified for a flush construct.
-=== Worksharing and barriers ===
-* `void $omp_barrier($omp_team team)`
- * performs a barrier only.  Note however that usually (always?) a barrier is accompanied by a flush-all, so `$omp_barrier_and_flush` should be used instead.
-* `void $omp_barrier_and_flush($omp_team team)`
- * checks data race violations by examining read & write set intersections
- * combines a barrier and a flush on all shared objects owned by the team.  Implicit in many OpenMP worksharing constructs.
-* `$domain $omp_arrive_loop($omp_team team, int location, $domain loop_dom,  $DecompositionStrategy strategy)`
- * called by a thread when it reaches an omp for loop, this function returns the subset of the loop domain specifying the iterations that this thread will execute.  The dimension of the domain returned equals the dimension of the given domain `loop_dom`.
-* `$domain(1) $omp_arrive_sections($omp_team team, int location, int numSections)`
- * called by a thread when it reaches an omp sections construct, this function returns the subset of the integers 0..numSections-1 specifying the indexes of the sections that this thread will execute.  The sections are numbered from 0 in increasing order.
-* `int $omp_arrive_single($omp_team team, int location)`
- * called by a thread when it reaches on omp single construct, returns the thread ID of the thread that will execute the single construct.
-== Worksharing model ==
-This section describes how the system functions dealing with worksharing are implemented.
-The global data structure `$omp_gteam` contains a FIFO queue for each thread.   The queue contains work-sharing records, one record for each work-sharing or barrier construct encountered.    The record contains the basic information about the construct as provided by the arguments to the arrival function, as well as the distribution chosen for that thread.
-The constructs are a lot like MPI collective operations, and are modeled similarly.
-When a thread arrives at one of these constructs, it invokes the relevant arrival function.  At this point you can determine whether this thread is the first to arrive at that construct.  If its queue is empty, it is the first, otherwise it is not first, and the oldest entry in its queue will be the entry corresponding to this construct.
-When a thread is the first thread to arrive at a construct, a distribution is chosen for every thread and a record is created and enqueued in each thread queue (including the caller).   The distributions can be chosen nondeterministically, possibly with some restrictions to achieve some tractability/soundness compromise.  The record for this thread is then dequeued and the iterator returned.
-If a thread is not the first to arrive, its record is dequeued and compared with the arguments given in the function call.  They should match, and if they don't, an error is reported.  This indicates that either threads encountered constructs in different orders or the loop parameters changed.
-== Translations of specific directives ==
-=== Translating `parallel` ===
-`parallel`: this spawns some nondeterministic number of threads, unless a `num_threads` clause is present, in which case the number of threads is specified exactly..  We will assume there is a constant `THREAD_MAX` defined somewhere.  The number of threads created will be between 1 and `THREAD_MAX` (inclusive).  Each thread is assigned an ID.  The original ("master") thread has ID 0.  All threads execute the parallel region.
-{{{
-  float a; // shared
-  int i; // private
-  int j; // firstprivate
-  #pragma omp parallel shared(a) private(i) firstprivate(j)
+  {
-    BLOCK
+  }
-}}}
-=>
-{{{
-  float a; // shared
-  int i; // private
-  int j; // firstprivate
-  { // parallel construct (begin)
-    int nthreads = 1+$choose_int(num_threads);
-    int fstpvt_j = j;
-    $omp_gteam gteam = $omp_gteam_create($here, nthreads);
-    $parfor (int tid : ($domain){0..nthreads-1}) {
-      $local_start();
-      $omp_team team = $omp_team_create($here, gteam, tid);
-      $read_set_push(); $write_set_push();
-      int i; // private
-      int j = fstpvt_j; // first private init.
-      transfromed(BLOCK)
-      $omp_barrier(team); //check data race included
-      $read_set_pop(); $write_set_pop();
-      $omp_team_destroy(team);
-      $local_end();
+    }
-    $omp_gteam_destroy(gteam);
-  } // parallel construct (end)
-}}}
-All variables that occur in the parallel construct, i.e., the lexical extent of the parallel construct, must be determined to be either private or shared.   This is determined by the clauses and the default rules as specified in the OpenMP Standard.  Obviously any variable declared within the construct itself must be private.
-For all '''private''' variables `i` and `j` (out of `$parfor`) not declared within
-the parallel construct, create new variables of the same type, `i` and `j` (in the `$parfer`).
-The new variable is declared within the thread scope.
-`j` (out of `$parfor`) is also '''firstprivate''',  then `j` (in the `$parfor`) is
-initialized with the value of `fstpvt_j`.
-Otherwise, `i` (in the `$parfor`) is uninitialized, so has an undefined value.
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-=== Translating `sections` ===
-Say there are `numSections` sections.  This number is known statically.
-{{{
-#pragma omp sections
-#pragma omp section
-  S1 // section 1
-#pragma omp section
-  S2 // section 2
-...
-}}}
-=>
-{{{
+{
-  $domain(1) sections_dist = $omp_arrive_sections(
-               team, section_id++, numSections);
-  $for (int _omp_sid : sections_dist) {
-    if(_omp_sid == 1) { // section 1
-      translate(S1);
+    }
-    if(_omp_sid == 2) { // section 2
-      translate(S2);
+    }
-    ...
+  }
-  $omp_barrier(team);
+}
-}}}
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-=== Translating `single` ===
-{{{
-#pragma omp single
-  BLOCK
-}}}
-=>
-{{{
-{ // single construct (begin)
-  int _omp_single_dist = $omp_arrive_single(
-        team, single_id++);
-  if (_omp_tid == _omp_single_dist) {
-    translate(BLOCK);
+  }
-  $omp_barrier(team);
-} // single construct (end)
-}}}
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-=== Translating `for` ===
-Try to determine whether the loop iterations are independent.  In that case, they can all be executed by one thread.
-Otherwise:
-{{{
-#pragma omp for
-for (i=1; i<n; i=i+1) {
-  BLOCK
+}
-}}}
-=>
-{{{
-{ // worksharing-loop construct (begin)
-  // [lb .. ub # step] (lb <= ub)
-  $domain loop_domain = {1 .. n-1 # 1};
-  $domain(1) loop_dist = ($domain(1))$omp_arrive_loop(
-               team, loop_id++, loop_domain, STRATEGY);
-  $for (int i : loop_dist) {
-    transfromed(BLOCK)
+  }
-  $omp_barrier(team);
-} // worksharing-loop construct (end)
-}}}
-We can vary the way the sub-domains are chosen to explore different tradeoffs and strategies.  On one extreme, every kind of partition can be explored; on the other, some fixed strategy like round-robin with chunksize 1 can be used.  This only changes the definition of `$omp_arrive_loop`, not the translation above.
-{{{
-#pragma omp parallel for collapse(3)
-for (i=0; i<n; i++)
-  for (j=0; j<m; j++)
-    for (k=0; k<l; k++) {
-      BLOCK
+    }
-}}}
-=>
-{{{
-{ // worksharing-loop construct (begin)
-  $domain loop_domain = {0..n-1 #1, 0..m-1 #1, 0..l-1 #1};
-  $domain(3) loop_dist = ($domain(3))$omp_arrive_loop(
-               team, loop_id++, loop_domain, STRATEGY);
-  $for (int i, j, k : loop_dist) {
-    transfromed(BLOCK)
+  }
-  $omp_barrier(team);
-} // worksharing-loop construct (end)
-}}}
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-=== Translating `master` ===
-{{{
-#pragma omp master
-  BLOCK
-}}}
-=>
-{{{
-if (tid == 0) {
-  translate(BLOCK);
+}
-}}}
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-=== Translating `barrier` ===
-{{{
-#pragma omp barrier
-}}}
-=>
-{{{
-$omp_barrier(team);
-}}}
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-=== Translating `critical` ===
-Basically, use a lock for each critical name, plus one for the "no name".  All threads must obtain lock to enter the critical section, then release it.
-i.e., if there are critical sections name a, b, and c, there should be global root-scope variables of boolean type named `_critical_noname`, `_critical_a`, etc. \\
-The translation pattern is:
-* '''update R/W sets''' for the caller thread that calls `$read_and_write_set_update(team)`
-* '''wait''' for other threads until they finish updating their R/W sets, by calling `$yield()`
-* '''acquire the lock''', which bonds to this `critical` region.
-* '''check data-race''' by calling `$check_data_race(team)` at each synchronizing point.
-* '''explore''' the  translated `critical` region.
-* '''update R/W sets'''
-* '''wait''' for other threads
-* '''release the lock'''
-* '''check data-race'''
-{{{
-// critical block with unspecified name
-#pragma omp critical
-  BLOCK_A
-// critical block with specified name
-#pragma omp critical(x)
-  BLOCK_X
-}}}
-=>
-{{{
-// global
-$omp_helper_semaphore critical_ = $omp_helper_semaphore_create(0);
-$omp_helper_semaphore critical_x = $omp_helper_semaphore_create(0);
-...
-// critical block with unspecified name
-  $read_and_write_set_update(team);
-  $yield();
-  $omp_helper_semaphore_p(critical_, 0);
-  $omp_helper_semaphore_v(critical_, 1);
-  $check_data_race(team);
-  translate(BLOCK_A)
-  $read_and_write_set_update(team);
-  $yield();
-  $omp_helper_semaphore_v(critical_, 0);
-  $check_data_race(team);
-// critical block with specified name
-  $read_and_write_set_update(team);
-  $yield();
-  $omp_helper_semaphore_p(critical_x, 0);
-  $omp_helper_semaphore_v(critical_x, 1);
-  $check_data_race(team);
-  translate(BLOCK_X);
-  $read_and_write_set_update(team);
-  $yield();
-  $omp_helper_semaphore_v(critical_x, 0);
-  $check_data_race(team);
-}}}
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-=== Translating `atomic` ===
-In general, the memory location protected by the atomic construct will be isolated by interacting with local intermediate variables, so that data race on other memory locations not protected by this atomic construct can be checked by CIVL.
-The local intermediate variable is named based on the protected variable identifier, the kind of atomic construct and a counter ID (for separating distinct atomic constructs, which have same `atomic-clause` for a same variable).
-Currently, CIVL only support sequentially consistent atomic operations and will NOT handle non-sequentially consistent ones.
-==== `read` ====
-{{{
-#pragma omp atomic read
-  v = x; // int type
-}}}
-=>
-{{{
-int $x_read$0;
-$yield(); // Yield on entering an atomic construct
-$track{ // Will not collect this read on x for checking data race
-  $x_read$0 = x;
+}
-v = $x_read$0;
-// Check data race and clean R/W sets of this thread.
-$check_data_race(&rs, &ws, nthreads, tid);
-$track_clean();
-rs[tid] = $mem_empty();
-ws[tid] = $mem_empty();
-}}}
-==== `write` ====
-{{{
-#pragma omp atomic write
-  x = EXPR; // int type
-}}}
-=>
-{{{
-int $x_write$0 = EXPR;
-$track{ // Will not collect this write on x for checking data race
-  x = $x_write$0;
+}
-}}}
-==== `update` ====
-{{{
-#pragma omp atomic update
-  x = x BIN_OP EXPR; // int type
-}}}
-=>
-{{{
-int $x_update$0 = EXPR;
-$track { // Will not collect this update on x for checking data race
-  x = x BIN_OP $x_update$0;
+}
-}}}
-(All update expression statement can be concluded in this form.)
-==== `capture` ====
-{{{
-#pragma omp atomic capture
+{
-  v = x; // int type
-  x = x BIN_OP EXPR; // int type
+}
-}}}
-=>
-{{{
-int $x_capture$0;
-int $x_capture_delta$0 = EXPR;
-$track { // R/W operations on x will not be collected.
-  $x_capture$0 = x;
-  x = x BIN_OP $x_capture_delta$0;
+}
-v = $x_capture$0;
-}}}
-(The order of the assignment statement of `$x_capture` and the update
-expression statement is corresponding to the order in source code.)
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-===  Translating`ordered` ===
-This can only be used inside and OMP `for` loop in which the pragma used the `ordered` clause.  (Check that.)  It indicates that the specified region must be executed in iteration order.
-In this case the system function must return an int iterator in which the ints occur in loop order.
-==== Case 1: ordered for-loop without a collapse clause ====
-{{{
-#pragma omp for ordered
-for (i=INIT; i<COND; i+=INCR) {
-  ...
-  #pragma omp ordered
-  BLOCK
-  ...
+}
-}}}
-=>
-{{{
-// global helper lock
-$omp_helper_semaphore ordered_0 = $omp_helper_semaphore_create(INIT);
-  ..
+{
-  $domain loop_domain = {INIT .. COND # INCR};
-  $domain(1) loop_dist = ($domain(1))$omp_arrive_loop(team, loop_id++, loop_domain, STRATEGY);
-  $for (int i : loop_dist) {
-    int i_next = i + I_INCR;
-    ...
-    $read_and_write_set_update(team);
-    $yield();
-    // wait until hs0->val is i
-    $omp_helper_semaphore_p(ordered_0, i);
-    $check_data_race(team);
-    translate(BLOCK);
-    $read_and_write_set_update(team);
-    $yield();
-    // set hs0->val as i_next
-    $omp_helper_semaphore_v(ordered_0, i_next);
-    $check_data_race(team);
-    ...
+  }
+}
-}}}
-==== Case 2: ordered for-loop with a collapse clause ====
-The transformation pattern is:
-* '''record next iteration value''' for all loop variables in the entry of the loop body.
-* '''update R/W sets''' for the current thread.
-* '''wait for other threads''' finishing R/W set updates.
-* '''semaphore P operations''' from the outer-most to the inner-most loop variable.
-  (This operation waits for a "signal" send when its sequentially previous iteration is completed.)
-* '''check data-race''' at any synchronizing point.
-* '''explore''' the translated `ordered` region
-* '''update R/W sets'''
-* '''wait for others'''
-* '''semaphore V operations''' from the inner-most to the outer-most loop variable;
-  (This operation issues a "signal" to enable the execution of its sequentially next iteration.)
-* '''check data-race'''
-{{{
-#pragma omp parallel for collapse(3) ordered
-for (i=I_INIT; i<I_COND; i+=I_INCR)
-  for (j=J_INIT; j<J_COND; j+=J_INCR)
-    for (k=K_INIT; k<K_COND; k+=K_INCR) {
-      #pragma omp ordered
-        BLOCK
+  }
-}}}
-=>
-{{{
-// global helper semaphore
-$omp_helper_semaphore ordered_0 = $omp_helper_semaphore_create(I_INIT);
-$omp_helper_semaphore ordered_1 = $omp_helper_semaphore_create(J_INIT);
-$omp_helper_semaphore ordered_2 = $omp_helper_semaphore_create(K_INIT);
-{ // worksharing-loop construct (begin)
-  ..
-  $for (int i, j, k: loop_dist) {
-    int i_next = i + I_INCR;
-    int j_next = j + J_INCR;
-    int k_next = k + K_INCR;
-    $read_and_write_set_update(team);
-    $yield();
-    $omp_helper_semaphore_p(ordered_0, i);
-    $omp_helper_semaphore_p(ordered_1, j);
-    $omp_helper_semaphore_p(ordered_2, k);
-    $check_data_race(team);
-    translate(BLOCK);
-    $read_and_write_set_update(team);
-    $yield();
-    if (k_next < K_COND) // 'k' deos NOT hit its bound
-      $omp_helper_semaphore_v(ordered_2, k_next); // incr 'k'
-    else {
-      $omp_helper_semaphore_v(ordered_2, K_INIT); // reset 'k'
-      if (j_next < J_COND) // 'j' deos NOT hit its bound
-        $omp_helper_semaphore_v(ordered_1, j_next); // incr 'j'
-      else {
-        $omp_helper_semaphore_v(ordered_1, J_INIT); // reset 'j'
-        $omp_helper_semaphore_v(ordered_0, i_next); // incr 'i'
+      }
+    }
-    $check_data_race(team);
+  }
-  ..
-} // worksharing-loop construct (end)
-}}}
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-=== Translating `nowait` ===
-Just leave out the `$omp_barrier_and_flush` at the end of the translated construct.
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-=== Translating `reduction` ===
-==== Case 1: parallel reduction ====
-{{{
-int sum = sum_base;
-int prod = prod_base;
-#pragma parallel shared(a) reduction(+:sum) reduction(*:prod)
+{
-  int tid = omp_get_thread_num();
-  sum += a[tid];
-  prod *= a[tid];
+}
-}}}
-=>
-{{{
-int sum = sum_base;
-int prod = prod_base;
-{ // parallel construct (begin)
-  ..
-  $parfor (..) {
-    ..
-    int *_omp_rdc_0_sum = &sum;
-    int sum = 0;
-    int *_omp_rdc_1_prod = &prod;
-    int prod = 1;
-    int tid = _omp_tid;
-    sum += a[tid];
-    prod *= a[tid];
-    $omp_reduction_combine(ADD, _omp_rdc_0_sum, sum);
-    $omp_reduction_combine(MULT, _omp_rdc_1_prod, prod);
-    $omp_barrier(team); //check data race
-    ..
+  }
-  ..
-} // parallel construct (end)
-}}}
-==== Case 2: for reduction ====
-{{{
-#pragma omp for reduction(+: fsum, isum)
-for (i=a; i<b; i+=c) {
-  BLOCK
-  fsum += EXPR0;
-  isum += EXPR1;
+}
-}}}
-=>
-{{{
-{ // worksharing-loop construct (begin)
-  $domain loop_domain = {a..b-1#c};
-  $domain(1) loop_dist = ($domain(1))$omp_arrive_loop(
-               team, loop_id++, loop_domain, STRATEGY);
-  float *_omp_rdc_0_fsum = &fsum;
-  float fsum = 0.0;
-  int *_omp_rdc_1_isum = &isum;
-  int isum = 0;
-  $for (int i : loop_dist) {
-    translate(BLOCK)
-    fsum += translate(EXPR0);
-    isum += translate(EXPR1);
+  }
-  $omp_reduction_combine(ADD, _omp_rdc_0_fsum, fsum);
-  $omp_reduction_combine(ADD, _omp_rdc_1_isum, isum);
-  $omp_barrier(team);
-} // worksharing-loop construct (end)
-}}}
-In `Case1` the combination is performed in `$parfor` after the completion of
-the last statement in the original structured block.
-In `Case2` the combination is performed immediately after the end of `$for`.
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])
-== Translations of specific routines ==
-=== Lock Routines ===
-==== OpenMP Simple Lock ====
-* `omp_lock_t`, `omp_init_lock` and `omp_destroy_lock` stay unchanged ([https://vsl.cis.udel.edu/trac/civl/browser/CIVL/trunk/src/include/civl/omp.cvl omp.cvl])
-* `omp_set_lock` and `omp_unset_lock` are translated as:
-{{{
-  ..
-  omp_set_lock(&lck);
-  BLOCK;
-  omp_unset_lock(&lck);
-  ..
-}}}
-=>
-{{{
-  ..
-  $read_and_write_set_update(team);
-  $yield();
-  $omp_set_lock(&lck, tid);
-  $check_data_race(team);
-  translated(BLOCK);
-  $read_and_write_set_update(team);
-  $yield();
-  $omp_unset_lock(&lck, tid);
-  $check_data_race(team);
-  ..
-}}}
-* `omp_test_lock` is translated as:
-{{{
-  while(! omp_test_lock(&lck)) {
-    .. // Other non-critical task
+  }
-  .. // Critical task
-  omp_unset_lock(&lck);
-}}}
-=>
-{{{
-  while(! omp_test_lock(&lck, tid)) { // no Sync, not wrapped
-    .. // Other non-critical task
+  }
-  .. // Critical task
-  $read_and_write_set_update(team);
-  $yield();
-  $omp_unset_lock(&lck, tid); // Sync required, so wrapped.
-  $check_data_race(team);
-}}}
-([https://vsl.cis.udel.edu/trac/civl/wiki/Next-GenOpenMPTransformation#OpenMPConstructs back to top])