= 1 OpenMP Transformation Introduction = == 1.1 The Sequentially Consistent Subset of OpenMP == The current domain for OpenMP program verification focuses on sequentially consistent programs that: * Do not use non-sequentially consistent atomic directives; * Do not rely on the accuracy of a false result from omp_test_lock and omp_test_nest_lock; and * Correctly avoid data races as required in Section 1.4.1 on page 23 (OpenMP spec 5.0) ([https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf spec]) The relaxed consistency model is invisible for such programs, and any explicit flush operations in such programs are redundant. == 1.2 Overview of OpenMP Transformation == === 1.2.1 Data Race Detection by Analyzing !Read/Write Operations === ==== Basic Transformation Pattern ==== 1. At a right mover (e.g. lock, enter_critical, enter_atomic), we do yield and push. 2. At a left mover (e.g., unlock, exit_critical, exit_atomic), we do check and then clean (pop). (Note that it only clean whats in between a pair of right mover and left mover.) 3. A barrier is transformed as: barrier() ; $check_data_race(); $track_clean(); barrier(); 4. Read and write operations are tracked in $atomic scope and functions/primitives of tracking read/write operations are: - {{{$track{}}} . . . . . . . . . . . . . . . . creates a new pair of read/write sets for tracking. - {{{} // end of $track scope}}} . . removes read/write sets currently used for tracking. - {{{$track_empty();}}} . . . . . . . . . . empties read/write sets currently used for tracking. - {{{$tracked_reads();}}} . . . . . . . . returns a set of tracked shared variables read by its caller process. - {{{$tracked_writes();}}} . . . . . . . returns a set of tracked shared variables written by its caller process. 5. {{{$check_data_race}}} asserts that: {{{ $forall (int i, j: 0 .. nthreads - 1) (i != j) ==> $mem_disjoint( writes[i], $mem_union(reads[j], writes[j]) ); }}} E.g., {{{ #pragma omp parallel shared(s) { A; #pragma omp critical(c) { B; } C; } }}} ==> {{{ void $check_data_race($mem *rs, $mem *ws, int n, int tid) { rs[tid] = $tracked_reads(); ws[tid] = $tracked_writes(); $assert( $forall (int j: 0 .. nthreads - 1) (j != tid) ==> $mem_disjoint(ws[tid], $mem_union(rs[j], ws[j])) && $mem_disjoint(...) ); } }}} {{{ // Init parallel region env. $range thread_range = 0 .. nthreads - 1; $domain(1) dom = ($domain){thread_range}; $omp_gteam gteam = $omp_gteam_create($here, nthreads); $mem RS[nthreads], WS[nthreads]; // global read/write sets for threads. for(int i=0; ivalue == 0); $track{ // Ignore race on lock critical_c->value = 1; // Enter critical block } B; // local memset stacks: rs:={{A}, {B}}; ws:={{A}, {B}}; $track{ // Ignore race on lock critical_c->value = 0; // Exit critical block } // update global memsets RS[tid] and WS[tid] in $check_data_race // global memses: RS[tid]:={A}U{B}; WS[tid]:={A}U{B}; $check_data_race(&RS, &WS, nthreads, tid); // 1st check $track_clean(); // local memset stacks: rs:={{}}; ws:={{}}; C; // local memset stacks: rs:={{C}}; ws:={{C}}; // update global memsets RS[tid] and WS[tid] in $check_data_race // global memses: RS[tid]:={C}; WS[tid]:={C}; $check_data_race(&rs, &ws, nthreads, tid); // 2nd check } // RS stack: ; WS stack: ; empty sets rs0, ws0; } // end $atomic } // end $for }}} We denotes that thread x executes statement A as Ax. Assuming there are two threads 0 and 1, two traces shall be explored: Trace1: A0y0 A1y1 B0cdr0C0cdr0 B1cdr1C1cdr1; Trace2: A0y0 A1y1 B1cdr1C1cdr1 B0cdr0C0cdr0; For the first trace, cdr0 following B0 checks data races between A0 and A1; B0 and A1; Then, w/r sets on thread 0 is cleaned. The second cdr0 following C0 checks data races between C0 and A1; An implicit barrier is placed at the end after the second cdr0, thus thread 1 starts executing its remaining part. The first cdr1 checks data races between C0 and A1; C0 and B1; Then, tracked sets on both A1 and B1 are cleaned. Finally, the last cdr1 in the first trace checks data races between C0 and C1; The second trace will complete checking by performing checks between B1 and A0 C1 and A0 C1 and B0 Because B is protected by critical section, races between B0 and B1 are not checked. As a result, all required pairs are checked. == 1.3 OpenMP Simplification == === OpenMP Simplifier === * We improve the existing OpenMP simplifier with the informations provided by the [[wiki: StaticAnalysis| Static Analysis]]. * Example 1: (`DRB067-restrictpointer1-orig-no.c` from [https://github.com/LLNL/dataracebench/releases/tag/v1.2.0 DataRaceBench v1.2.0]) {{{ #include typedef double real8; void foo(real8 * restrict newSxx, real8 * restrict newSyy, int length) { int i; #pragma omp parallel for private (i) firstprivate (length) for (i = 0; i <= length - 1; i += 1) { newSxx[i] = 0.0; newSyy[i] = 0.0; } } int main() { int length=10; real8* newSxx = malloc (length* sizeof (real8)); real8* newSyy = malloc (length* sizeof (real8)); foo(newSxx, newSyy, length); free (newSxx); free (newSyy); return 0; } }}} * The OpenMP simplifier analyzes the parallel region with points-to informations about the two pointer argument `newSxx` and `newSyy` from static analysis. * The OpenMP simplifier can determine that no data race will happen in the parallel region as long as no array out-of-bound error happens in it. * The OpenMP simplifier sequentializes the program which will be checked by CIVL. Any possible array out-of-bound error will be caught by CIVL. * Example 2 : ( `DRB014-outofbounds-orig-yes.c` from [https://github.com/LLNL/dataracebench/releases/tag/v1.2.0 DataRaceBench v1.2.0] ) {{{ #include int main(int argc, char* argv[]) { int i,j; int n=10, m=10; double b[n][m]; #pragma omp parallel for private(j) for (i=1;i int main() { int a[10],i; #pragma omp parallel for for(i=0; i<10; i++) { int c; c = i % 2; a[i + c] = 0; } } }}} * OpenMP Simplifier cannot prove data-race freedom for this program. * OpenMP2CIVL Transformer will transform this program to a equivalent CIVL-C program for model checking. Here is the code snippet for the CIVL-C program after transformation: {{{ int main() { int a[10]; { { int _elab_i = 0; for (; _elab_i < _omp_num_threads; _elab_i = _elab_i + 1) ; } int $sef$0 = $choose_int(_omp_num_threads); int _omp_nthreads = 1 + $sef$0; $range _omp_thread_range = 0 .. _omp_nthreads - 1; $domain(1) _omp_dom = ($domain){_omp_thread_range}; $omp_gteam _omp_gteam = $omp_gteam_create($here, _omp_nthreads); $omp_gshared _omp_a_gshared = $omp_gshared_create(_omp_gteam, &(a)); $parfor (int _omp_tid: _omp_dom) { $omp_team _omp_team = $omp_team_create($here, _omp_gteam, _omp_tid); int _omp_a_local[10]; int _omp_a_status[10]; $omp_shared _omp_a_shared = $omp_shared_create(_omp_team, _omp_a_gshared, &(_omp_a_local), &(_omp_a_status)); { $range _omp_r1 = 1 .. 10 - 1 # 1; $domain(1) _omp_loop_domain = ($domain){_omp_r1}; $domain $sef$1 = $omp_arrive_loop(_omp_team, 0, ($domain)_omp_loop_domain, 2); $domain(1) _omp_my_iters = ($domain(1))$sef$1; $for (int _omp_i_private: _omp_my_iters) { int c; c = _omp_i_private % 2; int _omp_write0; _omp_write0 = 0; $omp_write(_omp_a_shared, &(_omp_a_local[c]), &(_omp_write0)); } } $omp_barrier_and_flush(_omp_team); $omp_shared_destroy(_omp_a_shared); $omp_team_destroy(_omp_team); } $omp_gshared_destroy(_omp_a_gshared); $omp_gteam_destroy(_omp_gteam); } } }}} * CIVL reports PROVABLE errors: {{{ Thread 1 can not safely write to memory location &a[0], because thread 0 has written to that memory location and hasn't flushed yet. Violation 0 encountered at depth 64: CIVL execution violation in p2 (kind: ASSERTION_VIOLATION, certainty: PROVEABLE) at OpenMPTransformer "_omp_a_shared, &(_om" inserted by OpenMPTransformer.a_sharedWriteCall before civlc.cvh:105.14-20 "$malloc" Assertion: false -> false . . . Call stacks: process 0: main at OpenMPTransformer "$parfor (int _omp_ti" inserted by OpenMPTransformer.parallelPragma before civlc.cvh:105.14-20 "$malloc" process 1: $barrier_exit at concurrency.cvl:58.2-6 "$when" called from $barrier_call at concurrency.cvl:63.2-14 "$barrier_exit" called from $omp_barrier_and_flush at civl-omp.cvl:322.2-14 "$barrier_call" called from _par_proc0 at OpenMPTransformer "_omp_team" inserted by OpenMPTransformer.barrierAndFlushCall before civlc.cvh:105.14-20 "$malloc" process 2: $omp_write at civl-omp.cvl:247.4-10 "$assert" called from _par_proc0 at OpenMPTransformer "_omp_a_shared, &(_om" inserted by OpenMPTransformer.a_sharedWriteCall before civlc.cvh:105.14-20 "$malloc" process 3: $omp_arrive_loop at civl-omp.cvl:405.2-8 "$atomic" called from _par_proc0 at OpenMPTransformer "_omp_team, 0, ($doma" inserted by OpenMPTransformer.myItersDeclaration before civlc.cvh:105.14-20 "$malloc" Logging new entry 0, writing trace to CIVLREP/testOmp5_0.trace Terminating search after finding 1 violation. }}} === The New Approach for Model Checking OpenMP programs === * Using CIVL's new infrastructure that captures reads / writes at runtime * If the OpenMP Simplifier fails to sequentialize a program, transform the OpenMP program to the following form for model checking: {{{ $mem writes[nthreads], reads[nthreads]; $parfor (int tid:0..nthreads-1) { $write_set_push(); $read_set_push(); block1; // barrier writes[tid] = $write_set_pop(); reads[tid] = $read_set_pop(); // check for dataraces (collective operation) $write_set_push(); $read_set_push(); // barrier stmt2; // barrier writes[tid] = $write_set_pop(); reads[tid] = $read_set_pop(); // check for dataraces (collective operation) } }}} * Functions for managing `$mem` objects can be found in `mem.cvh` ---- [wiki:Next-GenOpenMPTransformation Back: Index] [wiki:OpenMP-Transformation-CIVL-Types-and-Functions Next: CIVL-OpenMP Types and Functions]