1 OpenMP Transformation Introduction
1.1 The Sequentially Consistent Subset of OpenMP
The current domain for OpenMP program verification focuses on sequentially consistent programs that:
- Do not use non-sequentially consistent atomic directives;
- Do not rely on the accuracy of a false result from omp_test_lock and omp_test_nest_lock; and
- Correctly avoid data races as required in Section 1.4.1 on page 23 (OpenMP spec 5.0) (spec)
The relaxed consistency model is invisible for such programs, and any explicit flush operations in such programs are redundant.
1.2 Overview of OpenMP Transformation
1.2.1 Data Race Detection by Analyzing Read/Write Operations
Basic Transformation Pattern
- At a right mover (e.g. lock, enter_critical, enter_atomic), we do yield and push.
- At a left mover (e.g., unlock, exit_critical, exit_atomic), we do check and then clean (pop).
(Note that it only clean whats in between a pair of right mover and left mover.)
- A barrier is transformed as: barrier() ; $check_data_race(); $track_clean(); barrier();
- Read and write operations are tracked in $atomic scope and functions/primitives of tracking read/write operations are:
$track{. . . . . . . . . . . . . . . . creates a new pair of read/write sets for tracking.} // end of $track scope. . removes read/write sets currently used for tracking.$track_empty();. . . . . . . . . . empties read/write sets currently used for tracking.$tracked_reads();. . . . . . . . returns a set of tracked shared variables read by its caller process.$tracked_writes();. . . . . . . returns a set of tracked shared variables written by its caller process.
$check_data_raceasserts that:$forall (int i, j: 0 .. nthreads - 1) (i != j) ==> $mem_disjoint( writes[i], $mem_union(reads[j], writes[j]) );
E.g.,
#pragma omp parallel shared(s)
{
A;
#pragma omp critical(c)
{
B;
}
C;
}
==>
void $update_memsets($mem *rs, $mem *ws, int tid) {
rs[tid] = $tracked_reads();
ws[tid] = $tracked_writes();
}
void $check_data_race($mem *rs, $mem *ws, int tid, int nthreads) {
$assert(
$forall (int j: 0 .. nthreads - 1) (j != tid) ==>
$mem_disjoint(ws[tid], $mem_union(rs[j], ws[j])) &&
$mem_disjoint(...)
);
}
void $lock(lock_t l) {
$track{ // Ignore race on lock
$when(l->value == 0);
l->value = 1; // Enter critical block
}
}
void $unlock(lock_t l) {
$track{ // Ignore race on lock
l->value = 0; // Exit critical block
}
}
// Init parallel region env.
$range thread_range = 0 .. nthreads - 1;
$domain(1) dom = ($domain){thread_range};
$omp_gteam gteam = $omp_gteam_create($here, nthreads);
$mem RS[nthreads], WS[nthreads]; // global read/write sets for threads.
for(int i=0; i<nthreads; i++) {
WS[i] = $mem_empty();
RS[i] = $mem_empty();
}
// Transformed parallel region
$for (int tid: dom) {
$depends_on(/*nothing*/)$atomic{
$omp_team team = $omp_team_create($here, gteam, tid);
$track{
// Enter parallel region
A;
$update_memsets(&RS, &WS, tid);
$yield();
$lock(&critical_c); // Optional for atomic
B;
$unlock(critical_c); // Optional for atomic
$update_memsets(&RS, &WS, tid);
$check_data_race(&RS, &WS, tid, nthreads); // 1st check
$track_clean();
C;
$update_memsets(&RS, &WS, tid);
$check_data_race(&rs, &ws, tid, nthreads); // 2nd check
} // RS stack: <empty>; WS stack: <empty>; empty sets rs0, ws0;
} // end $atomic
} // end $for
We denotes that thread x executes statement A as Ax. Assuming there are two threads 0 and 1, two traces shall be explored:
Trace1: A0y0 A1y1 B0cdr0C0cdr0 B1cdr1C1cdr1;
Trace2: A0y0 A1y1 B1cdr1C1cdr1 B0cdr0C0cdr0;
For the first trace, cdr0 following B0 checks data races between
A0 and A1; B0 and A1;
Then, w/r sets on thread 0 is cleaned.
The second cdr0 following C0 checks data races between
C0 and A1;
An implicit barrier is placed at the end after the second cdr0, thus thread 1 starts executing its remaining part. The first cdr1 checks data races between
C0 and A1; C0 and B1;
Then, tracked sets on both A1 and B1 are cleaned.
Finally, the last cdr1 in the first trace checks data races between
C0 and C1;
The second trace will complete checking by performing checks between
B1 and A0 C1 and A0 C1 and B0
Because B is protected by critical section, races between B0 and B1 are not checked. As a result, all required pairs are checked.
1.3 OpenMP Simplification
OpenMP Simplifier
- We improve the existing OpenMP simplifier with the informations provided by the Static Analysis.
- Example 1: (
DRB067-restrictpointer1-orig-no.cfrom DataRaceBench v1.2.0)#include <stdlib.h> typedef double real8; void foo(real8 * restrict newSxx, real8 * restrict newSyy, int length) { int i; #pragma omp parallel for private (i) firstprivate (length) for (i = 0; i <= length - 1; i += 1) { newSxx[i] = 0.0; newSyy[i] = 0.0; } } int main() { int length=10; real8* newSxx = malloc (length* sizeof (real8)); real8* newSyy = malloc (length* sizeof (real8)); foo(newSxx, newSyy, length); free (newSxx); free (newSyy); return 0; }- The OpenMP simplifier analyzes the parallel region with points-to informations about the two pointer argument
newSxxandnewSyyfrom static analysis. - The OpenMP simplifier can determine that no data race will happen in the parallel region as long as no array out-of-bound error happens in it.
- The OpenMP simplifier sequentializes the program which will be checked by CIVL. Any possible array out-of-bound error will be caught by CIVL.
- The OpenMP simplifier analyzes the parallel region with points-to informations about the two pointer argument
- Example 2 : (
DRB014-outofbounds-orig-yes.cfrom DataRaceBench v1.2.0 )#include <stdio.h> int main(int argc, char* argv[]) { int i,j; int n=10, m=10; double b[n][m]; #pragma omp parallel for private(j) for (i=1;i<n;i++) for (j=0;j<m;j++) // Note there will be out of bound access b[i][j]=b[i][j-1]; printf ("b[50][50]=%f\n",b[50][50]); return 0; }- OpenMP Simplifier proves that
b[i][j]andb[i][j-1]that accessed by different threads which hold different values ofiwill have no data race as long as no array out-of-bound error happens. - OpenMP Simplifier sequentializes the OpenMP program while the verification condition for array out-of-bound freedom will be checked by CIVL by default.
- The CIVL back-end runs the sequentialized program and detects the array out-of-bound access.
- OpenMP Simplifier proves that
Notes
- The OpenMP simplification transformation shall use pointer alias analysis
- The simplifier currently cannot detect data race caused by out-of-bound access on multiple dimensional arrays. The simplification transformer incorrectly sequentializes the example below without realizing that it can be sequentialized if and only if no 'logical' out-of-bound happens during the execution. A possible fix for this issue could be sequentializing the example with inserted assertions for ensuring the absence of 'logical' out-of-bound error.
e.g.,
int a[10][5]; #pragma omp parallel for for (int i = 0; i < 5; i++) for (int j = 1; j < 10; j++) a[i][j] = a[i][j-1]; //a[0][4] and a[1][-1] refer to the same element
1.4 Model Checking OpenMP Programs
The Current Approach For Model Checking OpenMP Programs
- Example:
#include <omp.h> int main() { int a[10],i; #pragma omp parallel for for(i=0; i<10; i++) { int c; c = i % 2; a[i + c] = 0; } }- OpenMP Simplifier cannot prove data-race freedom for this program.
- OpenMP2CIVL Transformer will transform this program to a equivalent CIVL-C program for model checking. Here is the code snippet for the CIVL-C program after transformation:
int main() { int a[10]; { { int _elab_i = 0; for (; _elab_i < _omp_num_threads; _elab_i = _elab_i + 1) ; } int $sef$0 = $choose_int(_omp_num_threads); int _omp_nthreads = 1 + $sef$0; $range _omp_thread_range = 0 .. _omp_nthreads - 1; $domain(1) _omp_dom = ($domain){_omp_thread_range}; $omp_gteam _omp_gteam = $omp_gteam_create($here, _omp_nthreads); $omp_gshared _omp_a_gshared = $omp_gshared_create(_omp_gteam, &(a)); $parfor (int _omp_tid: _omp_dom) { $omp_team _omp_team = $omp_team_create($here, _omp_gteam, _omp_tid); int _omp_a_local[10]; int _omp_a_status[10]; $omp_shared _omp_a_shared = $omp_shared_create(_omp_team, _omp_a_gshared, &(_omp_a_local), &(_omp_a_status)); { $range _omp_r1 = 1 .. 10 - 1 # 1; $domain(1) _omp_loop_domain = ($domain){_omp_r1}; $domain $sef$1 = $omp_arrive_loop(_omp_team, 0, ($domain)_omp_loop_domain, 2); $domain(1) _omp_my_iters = ($domain(1))$sef$1; $for (int _omp_i_private: _omp_my_iters) { int c; c = _omp_i_private % 2; int _omp_write0; _omp_write0 = 0; $omp_write(_omp_a_shared, &(_omp_a_local[c]), &(_omp_write0)); } } $omp_barrier_and_flush(_omp_team); $omp_shared_destroy(_omp_a_shared); $omp_team_destroy(_omp_team); } $omp_gshared_destroy(_omp_a_gshared); $omp_gteam_destroy(_omp_gteam); } } - CIVL reports PROVABLE errors:
Thread 1 can not safely write to memory location &<d5>a[0], because thread 0 has written to that memory location and hasn't flushed yet. Violation 0 encountered at depth 64: CIVL execution violation in p2 (kind: ASSERTION_VIOLATION, certainty: PROVEABLE) at OpenMPTransformer "_omp_a_shared, &(_om" inserted by OpenMPTransformer.a_sharedWriteCall before civlc.cvh:105.14-20 "$malloc" Assertion: false -> false . . . Call stacks: process 0: main at OpenMPTransformer "$parfor (int _omp_ti" inserted by OpenMPTransformer.parallelPragma before civlc.cvh:105.14-20 "$malloc" process 1: $barrier_exit at concurrency.cvl:58.2-6 "$when" called from $barrier_call at concurrency.cvl:63.2-14 "$barrier_exit" called from $omp_barrier_and_flush at civl-omp.cvl:322.2-14 "$barrier_call" called from _par_proc0 at OpenMPTransformer "_omp_team" inserted by OpenMPTransformer.barrierAndFlushCall before civlc.cvh:105.14-20 "$malloc" process 2: $omp_write at civl-omp.cvl:247.4-10 "$assert" called from _par_proc0 at OpenMPTransformer "_omp_a_shared, &(_om" inserted by OpenMPTransformer.a_sharedWriteCall before civlc.cvh:105.14-20 "$malloc" process 3: $omp_arrive_loop at civl-omp.cvl:405.2-8 "$atomic" called from _par_proc0 at OpenMPTransformer "_omp_team, 0, ($doma" inserted by OpenMPTransformer.myItersDeclaration before civlc.cvh:105.14-20 "$malloc" Logging new entry 0, writing trace to CIVLREP/testOmp5_0.trace Terminating search after finding 1 violation.
The New Approach for Model Checking OpenMP programs
- Using CIVL's new infrastructure that captures reads / writes at runtime
- If the OpenMP Simplifier fails to sequentialize a program, transform the OpenMP program to the following form for model checking:
$mem writes[nthreads], reads[nthreads]; $parfor (int tid:0..nthreads-1) { $write_set_push(); $read_set_push(); block1; // barrier writes[tid] = $write_set_pop(); reads[tid] = $read_set_pop(); // check for dataraces (collective operation) $write_set_push(); $read_set_push(); // barrier stmt2; // barrier writes[tid] = $write_set_pop(); reads[tid] = $read_set_pop(); // check for dataraces (collective operation) } - Functions for managing
$memobjects can be found inmem.cvh
